Alert button
Picture for Jiaxin Wei

Jiaxin Wei

Alert button

RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

Sep 19, 2023
Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li, Pan Ji

While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information, mainly facilitating the search for inlier 2D-3D correspondence. Meanwhile, a separate branch is designed to directly recover the metric scale of the object based on category-level statistics. Finally, we advocate using the RANSAC-P$n$P algorithm to robustly solve for 6D object pose. Extensive experiments have been conducted on both synthetic and real datasets, demonstrating the superior performance of our method over previous state-of-the-art RGB-based approaches, especially in terms of rotation accuracy.

Viaarxiv icon

Cross-modal Place Recognition in Image Databases using Event-based Sensors

Jul 03, 2023
Xiang Ji, Jiaxin Wei, Yifu Wang, Huiliang Shang, Laurent Kneip

Figure 1 for Cross-modal Place Recognition in Image Databases using Event-based Sensors
Figure 2 for Cross-modal Place Recognition in Image Databases using Event-based Sensors
Figure 3 for Cross-modal Place Recognition in Image Databases using Event-based Sensors
Figure 4 for Cross-modal Place Recognition in Image Databases using Event-based Sensors

Visual place recognition is an important problem towards global localization in many robotics tasks. One of the biggest challenges is that it may suffer from illumination or appearance changes in surrounding environments. Event cameras are interesting alternatives to frame-based sensors as their high dynamic range enables robust perception in difficult illumination conditions. However, current event-based place recognition methods only rely on event information, which restricts downstream applications of VPR. In this paper, we present the first cross-modal visual place recognition framework that is capable of retrieving regular images from a database given an event query. Our method demonstrates promising results with respect to the state-of-the-art frame-based and event-based methods on the Brisbane-Event-VPR dataset under different scenarios. We also verify the effectiveness of the combination of retrieval and classification, which can boost performance by a large margin.

Viaarxiv icon

Incremental Semantic Localization using Hierarchical Clustering of Object Association Sets

Aug 28, 2022
Lan Hu, Zhongwei Luo, Runze Yuan, Yuchen Cao, Jiaxin Wei, Kai Wangand Laurent Kneip

Figure 1 for Incremental Semantic Localization using Hierarchical Clustering of Object Association Sets
Figure 2 for Incremental Semantic Localization using Hierarchical Clustering of Object Association Sets
Figure 3 for Incremental Semantic Localization using Hierarchical Clustering of Object Association Sets
Figure 4 for Incremental Semantic Localization using Hierarchical Clustering of Object Association Sets

We present a novel approach for relocalization or place recognition, a fundamental problem to be solved in many robotics, automation, and AR applications. Rather than relying on often unstable appearance information, we consider a situation in which the reference map is given in the form of localized objects. Our localization framework relies on 3D semantic object detections, which are then associated to objects in the map. Possible pair-wise association sets are grown based on hierarchical clustering using a merge metric that evaluates spatial compatibility. The latter notably uses information about relative object configurations, which is invariant with respect to global transformations. Association sets are furthermore updated and expanded as the camera incrementally explores the environment and detects further objects. We test our algorithm in several challenging situations including dynamic scenes, large view-point changes, and scenes with repeated instances. Our experiments demonstrate that our approach outperforms prior art in terms of both robustness and accuracy.

Viaarxiv icon

Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database

Jul 04, 2022
Jiaxin Wei, Lan Hu, Chenyu Wang, Laurent Kneip

Figure 1 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
Figure 2 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
Figure 3 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database
Figure 4 for Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database

We present a new solution to the fine-grained retrieval of clean CAD models from a large-scale database in order to recover detailed object shape geometries for RGBD scans. Unlike previous work simply indexing into a moderately small database using an object shape descriptor and accepting the top retrieval result, we argue that in the case of a large-scale database a more accurate model may be found within a neighborhood of the descriptor. More importantly, we propose that the distinctiveness deficiency of shape descriptors at the instance level can be compensated by a geometry-based re-ranking of its neighborhood. Our approach first leverages the discriminative power of learned representations to distinguish between different categories of models and then uses a novel robust point set distance metric to re-rank the CAD neighborhood, enabling fine-grained retrieval in a large shape database. Evaluation on a real-world dataset shows that our geometry-based re-ranking is a conceptually simple but highly effective method that can lead to a significant improvement in retrieval accuracy compared to the state-of-the-art.

* Accepted by IROS 2022 
Viaarxiv icon

Spotlights: Probing Shapes from Spherical Viewpoints

May 25, 2022
Jiaxin Wei, Lige Liu, Ran Cheng, Wenqing Jiang, Minghao Xu, Xinyu Jiang, Tao Sun, Soren Schwertfeger, Laurent Kneip

Figure 1 for Spotlights: Probing Shapes from Spherical Viewpoints
Figure 2 for Spotlights: Probing Shapes from Spherical Viewpoints
Figure 3 for Spotlights: Probing Shapes from Spherical Viewpoints
Figure 4 for Spotlights: Probing Shapes from Spherical Viewpoints

Recent years have witnessed the surge of learned representations that directly build upon point clouds. Though becoming increasingly expressive, most existing representations still struggle to generate ordered point sets. Inspired by spherical multi-view scanners, we propose a novel sampling model called Spotlights to represent a 3D shape as a compact 1D array of depth values. It simulates the configuration of cameras evenly distributed on a sphere, where each virtual camera casts light rays from its principal point through sample points on a small concentric spherical cap to probe for the possible intersections with the object surrounded by the sphere. The structured point cloud is hence given implicitly as a function of depths. We provide a detailed geometric analysis of this new sampling scheme and prove its effectiveness in the context of the point cloud completion task. Experimental results on both synthetic and real data demonstrate that our method achieves competitive accuracy and consistency while having a significantly reduced computational cost. Furthermore, we show superior performance on the downstream point cloud registration task over state-of-the-art completion methods.

* 17 pages 
Viaarxiv icon