Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zixin Luo

BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

Nov 22, 2019

Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, Long Quan

Figure 1 for BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

Figure 2 for BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

Figure 3 for BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

Figure 4 for BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks

Abstract:While deep learning has recently achieved great success on multi-view stereo (MVS), limited training data makes the trained model hard to be generalized to unseen scenarios. Compared with other computer vision tasks, it is rather difficult to collect a large-scale MVS dataset as it requires expensive active scanners and labor-intensive process to obtain ground truth 3D structures. In this paper, we introduce BlendedMVS, a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS. To create the dataset, we apply a 3D reconstruction pipeline to recover high-quality textured meshes from images of well-selected scenes. Then, we render these mesh models to color images and depth maps. The rendered color images are further blended with the input images to generate photo-realistic blended images as the training input. Our dataset contains over 17k high-resolution images covering a variety of scenes, including cities, architectures, sculptures and small objects. Extensive experiments demonstrate that BlendedMVS endows the trained model with significantly better generalization ability compared with other MVS datasets. The entire dataset with pretrained models will be made publicly available at https://github.com/YoYo000/BlendedMVS.

Via

Access Paper or Ask Questions

Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Sep 19, 2019

Tianwei Shen, Lei Zhou, Zixin Luo, Yao Yao, Shiwei Li, Jiahui Zhang, Tian Fang, Long Quan

Figure 1 for Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Figure 2 for Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Figure 3 for Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Figure 4 for Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Abstract:The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data. In this paper, we address the issue when previous assumptions of the self-supervised approaches are violated due to the dynamic nature of real-world scenes. Different from handling the noise as uncertainty, our key idea is to incorporate more robust geometric quantities and enforce internal consistency in the temporal image sequence. As demonstrated on commonly used benchmark datasets, the proposed method substantially improves the state-of-the-art methods on both depth and relative pose estimation for monocular image sequences, without adding inference overhead.

* International Conference on Computer Vision (ICCV) Workshop 2019

Via

Access Paper or Ask Questions

Learning Two-View Correspondences and Geometry Using Order-Aware Network

Aug 14, 2019

Jiahui Zhang, Dawei Sun, Zixin Luo, Anbang Yao, Lei Zhou, Tianwei Shen, Yurong Chen, Long Quan, Hongen Liao

Figure 1 for Learning Two-View Correspondences and Geometry Using Order-Aware Network

Figure 2 for Learning Two-View Correspondences and Geometry Using Order-Aware Network

Figure 3 for Learning Two-View Correspondences and Geometry Using Order-Aware Network

Figure 4 for Learning Two-View Correspondences and Geometry Using Order-Aware Network

Abstract:Establishing correspondences between two images requires both local and global spatial context. Given putative correspondences of feature points in two views, in this paper, we propose Order-Aware Network, which infers the probabilities of correspondences being inliers and regresses the relative pose encoded by the essential matrix. Specifically, this proposed network is built hierarchically and comprises three novel operations. First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix. These clusters are in a canonical order and invariant to input permutations. Next, the clusters are spatially correlated to form the global context of correspondences. After that, the context-encoded clusters are recovered back to the original size through a proposed upsampling operator. We intensively experiment on both outdoor and indoor datasets. The accuracy of the two-view geometry and correspondences are significantly improved over the state-of-the-arts. Code will be available at https://github.com/zjhthu/OANet.git.

* Accepted to ICCV 2019, and Winner solution to both tracks of CVPR IMW 2019 Challenge. Code will be available soon at https://github.com/zjhthu/OANet.git

Via

Access Paper or Ask Questions

ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Apr 08, 2019

Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan

Figure 1 for ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Figure 2 for ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Figure 3 for ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Figure 4 for ContextDesc: Local Descriptor Augmentation with Cross-Modality Context

Abstract:Most existing studies on learning local features focus on the patch-based descriptions of individual keypoints, whereas neglecting the spatial relations established from their keypoint locations. In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. Specifically, we propose a unified learning framework that leverages and aggregates the cross-modality contextual information, including (i) visual context from high-level image representation, and (ii) geometric context from 2D keypoint distribution. Moreover, we propose an effective N-pair loss that eschews the empirical hyper-parameter search and improves the convergence. The proposed augmentation scheme is lightweight compared with the raw local feature description, meanwhile improves remarkably on several large-scale benchmarks with diversified scenes, which demonstrates both strong practicality and generalization ability in geometric matching applications.

* Accepted to CVPR 2019 (oral), supplementary materials included. (https://github.com/lzx551402/contextdesc)

Via

Access Paper or Ask Questions

Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Feb 27, 2019

Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, Long Quan

Figure 1 for Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Figure 2 for Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Figure 3 for Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Figure 4 for Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference

Abstract:Deep learning has recently demonstrated its excellent performance for multi-view stereo (MVS). However, one major limitation of current learned MVS approaches is the scalability: the memory-consuming cost volume regularization makes the learned MVS hard to be applied to high-resolution scenes. In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network. Instead of regularizing the entire 3D cost volume in one go, the proposed Recurrent Multi-view Stereo Network (R-MVSNet) sequentially regularizes the 2D cost maps along the depth direction via the gated recurrent unit (GRU). This reduces dramatically the memory consumption and makes high-resolution reconstruction feasible. We first show the state-of-the-art performance achieved by the proposed R-MVSNet on the recent MVS benchmarks. Then, we further demonstrate the scalability of the proposed method on several large-scale scenarios, where previous learned approaches often fail due to the memory constraint. Code is available at https://github.com/YoYo000/MVSNet.

* Accepted by CVPR2019

Via

Access Paper or Ask Questions

Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Feb 25, 2019

Tianwei Shen, Zixin Luo, Lei Zhou, Hanyu Deng, Runze Zhang, Tian Fang, Long Quan

Figure 1 for Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Figure 2 for Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Figure 3 for Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Figure 4 for Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Abstract:Accurate relative pose is one of the key components in visual odometry (VO) and simultaneous localization and mapping (SLAM). Recently, the self-supervised learning framework that jointly optimizes the relative pose and target image depth has attracted the attention of the community. Previous works rely on the photometric error generated from depths and poses between adjacent frames, which contains large systematic error under realistic scenes due to reflective surfaces and occlusions. In this paper, we bridge the gap between geometric loss and photometric loss by introducing the matching loss constrained by epipolar geometry in a self-supervised framework. Evaluated on the KITTI dataset, our method outperforms the state-of-the-art unsupervised ego-motion estimation methods by a large margin. The code and data are available at https://github.com/hlzz/DeepMatchVO.

* Accepted by ICRA 2019

Via

Access Paper or Ask Questions

Matchable Image Retrieval by Learning from Surface Reconstruction

Dec 10, 2018

Tianwei Shen, Zixin Luo, Lei Zhou, Runze Zhang, Siyu Zhu, Tian Fang, Long Quan

Figure 1 for Matchable Image Retrieval by Learning from Surface Reconstruction

Figure 2 for Matchable Image Retrieval by Learning from Surface Reconstruction

Figure 3 for Matchable Image Retrieval by Learning from Surface Reconstruction

Figure 4 for Matchable Image Retrieval by Learning from Surface Reconstruction

Abstract:Convolutional Neural Networks (CNNs) have achieved superior performance on object image retrieval, while Bag-of-Words (BoW) models with handcrafted local features still dominate the retrieval of overlapping images in 3D reconstruction. In this paper, we narrow down this gap by presenting an efficient CNN-based method to retrieve images with overlaps, which we refer to as the matchable image retrieval problem. Different from previous methods that generates training data based on sparse reconstruction, we create a large-scale image database with rich 3D geometrics and exploit information from surface reconstruction to obtain fine-grained training data. We propose a batched triplet-based loss function combined with mesh re-projection to effectively learn the CNN representation. The proposed method significantly accelerates the image retrieval process in 3D reconstruction and outperforms the state-of-the-art CNN-based and BoW methods for matchable image retrieval. The code and data are available at https://github.com/hlzz/mirror.

* accepted by ACCV 2018

Via

Access Paper or Ask Questions

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

Aug 16, 2018

Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, Long Quan

Figure 1 for GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

Figure 2 for GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

Figure 3 for GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

Figure 4 for GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

Abstract:Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction. In this paper, we mitigate this limitation by proposing a novel local descriptor learning approach that integrates geometry constraints from multi-view reconstructions, which benefits the learning process in terms of data generation, data sampling and loss computation. We refer to the proposed descriptor as GeoDesc, and demonstrate its superior performance on various large-scale benchmarks, and in particular show its great success on challenging reconstruction tasks. Moreover, we provide guidelines towards practical integration of learned descriptors in Structure-from-Motion (SfM) pipelines, showing the good trade-off that GeoDesc delivers to 3D reconstruction tasks between accuracy and efficiency.

* Accepted to ECCV'18

Via

Access Paper or Ask Questions

MVSNet: Depth Inference for Unstructured Multi-view Stereo

Jul 17, 2018

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan

Figure 1 for MVSNet: Depth Inference for Unstructured Multi-view Stereo

Figure 2 for MVSNet: Depth Inference for Unstructured Multi-view Stereo

Figure 3 for MVSNet: Depth Inference for Unstructured Multi-view Stereo

Figure 4 for MVSNet: Depth Inference for Unstructured Multi-view Stereo

Abstract:We present an end-to-end deep learning architecture for depth map inference from multi-view images. In the network, we first extract deep visual image features, and then build the 3D cost volume upon the reference camera frustum via the differentiable homography warping. Next, we apply 3D convolutions to regularize and regress the initial depth map, which is then refined with the reference image to generate the final output. Our framework flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature. The proposed MVSNet is demonstrated on the large-scale indoor DTU dataset. With simple post-processing, our method not only significantly outperforms previous state-of-the-arts, but also is several times faster in runtime. We also evaluate MVSNet on the complex outdoor Tanks and Temples dataset, where our method ranks first before April 18, 2018 without any fine-tuning, showing the strong generalization ability of MVSNet.

* Accepted to European Conference on Computer Vision (ECCV 2018)

Via

Access Paper or Ask Questions

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Jul 16, 2018

Lei Zhou, Siyu Zhu, Zixin Luo, Tianwei Shen, Runze Zhang, Mingmin Zhen, Tian Fang, Long Quan

Figure 1 for Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Figure 2 for Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Figure 3 for Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Figure 4 for Learning and Matching Multi-View Descriptors for Registration of Point Clouds

Abstract:Critical to the registration of point clouds is the establishment of a set of accurate correspondences between points in 3D space. The correspondence problem is generally addressed by the design of discriminative 3D local descriptors on the one hand, and the development of robust matching strategies on the other hand. In this work, we first propose a multi-view local descriptor, which is learned from the images of multiple views, for the description of 3D keypoints. Then, we develop a robust matching approach, aiming at rejecting outlier matches based on the efficient inference via belief propagation on the defined graphical model. We have demonstrated the boost of our approaches to registration on the public scanning and multi-view stereo datasets. The superior performance has been verified by the intensive comparisons against a variety of descriptors and matching methods.

Via

Access Paper or Ask Questions