Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenbing Tao

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Dec 20, 2019
Lin Zhao, Wenbing Tao

Figure 1 for JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Figure 2 for JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Figure 3 for JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Figure 4 for JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.

* Accepted by AAAI2020

Via

Access Paper or Ask Questions

IoU-uniform R-CNN: Breaking Through the Limitations of RPN

Dec 11, 2019
Li Zhu, Zihao Xie, Liman Liu, Bo Tao, Wenbing Tao

Figure 1 for IoU-uniform R-CNN: Breaking Through the Limitations of RPN

Figure 2 for IoU-uniform R-CNN: Breaking Through the Limitations of RPN

Figure 3 for IoU-uniform R-CNN: Breaking Through the Limitations of RPN

Figure 4 for IoU-uniform R-CNN: Breaking Through the Limitations of RPN

Region Proposal Network (RPN) is the cornerstone of two-stage object detectors, it generates a sparse set of object proposals and alleviates the extrem foregroundbackground class imbalance problem during training. However, we find that the potential of the detector has not been fully exploited due to the IoU distribution imbalance and inadequate quantity of the training samples generated by RPN. With the increasing intersection over union (IoU), the exponentially smaller numbers of positive samples would lead to the distribution skewed towards lower IoUs, which hinders the optimization of detector at high IoU levels. In this paper, to break through the limitations of RPN, we propose IoU-Uniform R-CNN, a simple but effective method that directly generates training samples with uniform IoU distribution for the regression branch as well as the IoU prediction branch. Besides, we improve the performance of IoU prediction branch by eliminating the feature offsets of RoIs at inference, which helps the NMS procedure by preserving accurately localized bounding box. Extensive experiments on the PASCAL VOC and MS COCO dataset show the effectiveness of our method, as well as its compatibility and adaptivity to many object detection architectures. The code is made publicly available at https://github.com/zl1994/IoU-Uniform-R-CNN,

Via

Access Paper or Ask Questions

Localization-aware Channel Pruning for Object Detection

Nov 21, 2019
Zihao Xie, Wenbing Tao, Li Zhu, Lin Zhao

Figure 1 for Localization-aware Channel Pruning for Object Detection

Figure 2 for Localization-aware Channel Pruning for Object Detection

Figure 3 for Localization-aware Channel Pruning for Object Detection

Figure 4 for Localization-aware Channel Pruning for Object Detection

Channel pruning is one of the important methods for deep model compression. Most of existing pruning methods mainly focus on classification. Few of them conduct systematic research on object detection. However, object detection is different from classification, which requires not only semantic information but also localization information. In this paper, based on DCP \cite{zhuang2018discrimination} which is state-of-the-art pruning method for classification, we propose a localization-aware auxiliary network to find out the channels with key information for classification and regression so that we can conduct channel pruning directly for object detection, which saves lots of time and computing resources. In order to capture the localization information, we first design the auxiliary network with a contextual ROIAlign layer which can obtain precise localization information of the default boxes by pixel alignment and enlarges the receptive fields of the default boxes when pruning shallow layers. Then, we construct a loss function for object detection task which tends to keep the channels that contain the key information for classification and regression. Extensive experiments demonstrate the effectiveness of our method. On MS COCO, we prune 70\% parameters of the SSD based on ResNet-50 with modest accuracy drop, which outperforms the-state-of-art method.

Via

Access Paper or Ask Questions

TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution

Nov 18, 2019
Zhenxing Mi, Yiming Luo, Wenbing Tao

Figure 1 for TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution

Figure 2 for TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution

Figure 3 for TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution

Figure 4 for TSRNet: Scalable 3D Surface Reconstruction Network for Point Clouds using Tangent Convolution

Existing learning-based surface reconstruction methods from point clouds are still facing challenges in terms of scalability and preservation of details on point clouds of large scales. In this paper, we propose the TSRNet, a novel scalable learning-based method for surface reconstruction. It first takes a point cloud and its related octree vertices as input and learns to classify whether the octree vertices are in front or at back of the implicit surface. Then the Marching Cubes (MC) is applied to extract a surface from the binary labeled octree. In our method, we design a scalable learning-based pipeline for surface reconstruction. It does not consider the whole input data at once. It allows to divide the point cloud and octree vertices and to process different parts in parallel. Our network captures local geometry details by constructing local geometry-aware features for octree vertices. The local geometry-aware features enhance the predication accuracy greatly for the relative position among the vertices and the implicit surface. They also boost the generalization capability of our network. Our method is able to reconstruct local geometry details from point clouds of different scales, especially for point clouds with millions of points. More importantly, the time consumption on such point clouds is acceptable and competitive. Experiments show that our method achieves a significant breakthrough in scalability and quality compared with state-of-the-art learning-based methods.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

GLA-Net: An Attention Network with Guided Loss for Mismatch Removal

Sep 28, 2019
Zhi Chen, Fan Yang, Wenbing Tao

Figure 1 for GLA-Net: An Attention Network with Guided Loss for Mismatch Removal

Figure 2 for GLA-Net: An Attention Network with Guided Loss for Mismatch Removal

Figure 3 for GLA-Net: An Attention Network with Guided Loss for Mismatch Removal

Figure 4 for GLA-Net: An Attention Network with Guided Loss for Mismatch Removal

Mismatch removal is a critical prerequisite in many feature-based tasks. Recent attempts cast the mismatch removal task as a binary classification problem and solve it through deep learning based methods. In these methods, the imbalance between positive and negative classes is important, which affects network performance, i.e., Fn-score. To establish the link between Fn-score and loss, we propose to guide the loss with the Fn-score directly. We theoretically demonstrate the direct link between our Guided Loss and Fn-score during training. Moreover, we discover that outliers often impair global context in mismatch removal networks. To address this issue, we introduce the attention mechanism to mismatch removal task and propose a novel Inlier Attention Block (IA Block). To evaluate the effectiveness of our loss and IA Block, we design an end-to-end network for mismatch removal, called GLA-Net \footnote{Our code will be available in Github later.}. Experiments have shown that our network achieves the state-of-the-art performance on benchmark datasets.

Via

Access Paper or Ask Questions

Multi-Scale Geometric Consistency Guided Multi-View Stereo

Apr 17, 2019
Qingshan Xu, Wenbing Tao

Figure 1 for Multi-Scale Geometric Consistency Guided Multi-View Stereo

Figure 2 for Multi-Scale Geometric Consistency Guided Multi-View Stereo

Figure 3 for Multi-Scale Geometric Consistency Guided Multi-View Stereo

Figure 4 for Multi-Scale Geometric Consistency Guided Multi-View Stereo

In this paper, we propose an efficient multi-scale geometric consistency guided multi-view stereo method for accurate and complete depth map estimation. We first present our basic multi-view stereo method with Adaptive Checkerboard sampling and Multi-Hypothesis joint view selection (ACMH). It leverages structured region information to sample better candidate hypotheses for propagation and infer the aggregation view subset at each pixel. For the depth estimation of low-textured areas, we further propose to combine ACMH with multi-scale geometric consistency guidance (ACMM) to obtain the reliable depth estimates for low-textured areas at coarser scales and guarantee that they can be propagated to finer scales. To correct the erroneous estimates propagated from the coarser scales, we present a novel detail restorer. Experiments on extensive datasets show our method achieves state-of-the-art performance, recovering the depth estimation not only in low-textured areas but also in details.

* Accepted by CVPR2019

Via

Access Paper or Ask Questions

GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction

May 23, 2018
Tao Xu, Kun Sun, Wenbing Tao

Figure 1 for GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction

Figure 2 for GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction

Figure 3 for GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction

Figure 4 for GPU Accelerated Cascade Hashing Image Matching for Large Scale 3D Reconstruction

Image feature point matching is a key step in Structure from Motion(SFM). However, it is becoming more and more time consuming because the number of images is getting larger and larger. In this paper, we proposed a GPU accelerated image matching method with improved Cascade Hashing. Firstly, we propose a Disk-Memory-GPU data exchange strategy and optimize the load order of data, so that the proposed method can deal with big data. Next, we parallelize the Cascade Hashing method on GPU. An improved parallel reduction and an improved parallel hashing ranking are proposed to fulfill this task. Finally, extensive experiments show that our image matching is about 20 times faster than SiftGPU on the same graphics card, nearly 100 times faster than the CPU CasHash method and hundreds of times faster than the CPU Kd-Tree based matching method. Further more, we introduce the epipolar constraint to the proposed method, and use the epipolar geometry to guide the feature matching procedure, which further reduces the matching cost.

Via

Access Paper or Ask Questions

Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection

May 21, 2018
Qingshan Xu, Wenbing Tao

Figure 1 for Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection

Figure 2 for Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection

Figure 3 for Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection

Figure 4 for Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection

In computer vision domain, how to fast and accurately perform multiview stereo (MVS) is still a challenging problem. In this paper we present a fast yet accurate method for 3D dense reconstruction, called AMHMVS, built on the PatchMatch based stereo algorithm. Different from the regular symmetric propagation scheme, our approach adopts an asymmetric checkerboard propagation strategy, which can adaptively make effective hypotheses expand further according to the confidence of current neighbor hypotheses. In order to aggregate visual information from multiple images better, we propose the multi-hypothesis joint view selection for each pixel, which leverages a cost matrix based on the multiple propagated hypotheses to robustly infer an appropriate aggregation subset parallel. Combined with the above two steps, our approach not only has the capacity of massively parallel computation, but also obtains high accuracy and completeness. Experiments on extensive datasets show that our method achieves more accurate and robust results, and runs faster than the competing methods.

Via

Access Paper or Ask Questions

Trilaminar Multiway Reconstruction Tree for Efficient Large Scale Structure from Motion

Dec 21, 2016
Kun Sun, Wenbing Tao

Figure 1 for Trilaminar Multiway Reconstruction Tree for Efficient Large Scale Structure from Motion

Figure 2 for Trilaminar Multiway Reconstruction Tree for Efficient Large Scale Structure from Motion

Figure 3 for Trilaminar Multiway Reconstruction Tree for Efficient Large Scale Structure from Motion

Figure 4 for Trilaminar Multiway Reconstruction Tree for Efficient Large Scale Structure from Motion

Accuracy and efficiency are two key problems in large scale incremental Structure from Motion (SfM). In this paper, we propose a unified framework to divide the image set into clusters suitable for reconstruction as well as find multiple reliable and stable starting points. Image partitioning performs in two steps. First, some small image groups are selected at places with high image density, and then all the images are clustered according to their optimal reconstruction paths to these image groups. This promises that the scene is always reconstructed from dense places to sparse areas, which can reduce error accumulation when images have weak overlap. To enable faster speed, images outside the selected group in each cluster are further divided to achieve a greater degree of parallelism. Experiments show that our method achieves significant speedup, higher accuracy and better completeness.

* this manuscript has been submitted to cvpr 2017

Via

Access Paper or Ask Questions