



Geometric constraints between feature matches are critical in 3D point cloud registration problems. Existing approaches typically model unordered matches as a consistency graph and sample consistent matches to generate hypotheses. However, explicit graph construction introduces noise, posing great challenges for handcrafted geometric constraints to render consistency among matches. To overcome this, we propose HyperGCT, a flexible dynamic Hyper-GNN-learned geometric constraint that leverages high-order consistency among 3D correspondences. To our knowledge, HyperGCT is the first method that mines robust geometric constraints from dynamic hypergraphs for 3D registration. By dynamically optimizing the hypergraph through vertex and edge feature aggregation, HyperGCT effectively captures the correlations among correspondences, leading to accurate hypothesis generation. Extensive experiments on 3DMatch, 3DLoMatch, KITTI-LC, and ETH show that HyperGCT achieves state-of-the-art performance. Furthermore, our method is robust to graph noise, demonstrating a significant advantage in terms of generalization. The code will be released.
The registration between the pre-operative model and the intra-operative surface is crucial in image-guided liver surgery, as it facilitates the effective use of pre-operative information during the procedure. However, the intra-operative surface, usually represented as a point cloud, often has limited coverage, especially in laparoscopic surgery, and is prone to holes and noise, posing significant challenges for registration methods. Point cloud completion methods have the potential to alleviate these issues. Thus, we explore six state-of-the-art point cloud completion methods to identify the optimal completion method for liver surgery applications. We focus on a patient-specific approach for liver point cloud completion from a partial liver surface under three cases: canonical pose, non-canonical pose, and canonical pose with noise. The transformer-based method, AdaPoinTr, outperforms all other methods to generate a complete point cloud from the given partial liver point cloud under the canonical pose. On the other hand, our findings reveal substantial performance degradation of these methods under non-canonical poses and noisy settings, highlighting the limitations of these methods, which suggests the need for a robust point completion method for its application in image-guided liver surgery.
Soft-tissue surgeries, such as tumor resections, are complicated by tissue deformations that can obscure the accurate location and shape of tissues. By representing tissue surfaces as point clouds and applying non-rigid point cloud registration (PCR) methods, surgeons can better understand tissue deformations before, during, and after surgery. Existing non-rigid PCR methods, such as feature-based approaches, struggle with robustness against challenges like noise, outliers, partial data, and large deformations, making accurate point correspondence difficult. Although learning-based PCR methods, particularly Transformer-based approaches, have recently shown promise due to their attention mechanisms for capturing interactions, their robustness remains limited in challenging scenarios. In this paper, we present DefTransNet, a novel end-to-end Transformer-based architecture for non-rigid PCR. DefTransNet is designed to address the key challenges of deformable registration, including large deformations, outliers, noise, and partial data, by inputting source and target point clouds and outputting displacement vector fields. The proposed method incorporates a learnable transformation matrix to enhance robustness to affine transformations, integrates global and local geometric information, and captures long-range dependencies among points using Transformers. We validate our approach on four datasets: ModelNet, SynBench, 4DMatch, and DeformedTissue, using both synthetic and real-world data to demonstrate the generalization of our proposed method. Experimental results demonstrate that DefTransNet outperforms current state-of-the-art registration networks across various challenging conditions. Our code and data are publicly available.




The discriminative feature is crucial for point cloud registration. Recent methods improve the feature discriminative by distinguishing between non-overlapping and overlapping region points. However, they still face challenges in distinguishing the ambiguous structures in the overlapping regions. Therefore, the ambiguous features they extracted resulted in a significant number of outlier matches from overlapping regions. To solve this problem, we propose a prior-guided SMoE-based registration method to improve the feature distinctiveness by dispatching the potential correspondences to the same experts. Specifically, we propose a prior-guided SMoE module by fusing prior overlap and potential correspondence embeddings for routing, assigning tokens to the most suitable experts for processing. In addition, we propose a registration framework by a specific combination of Transformer layer and prior-guided SMoE module. The proposed method not only pays attention to the importance of locating the overlapping areas of point clouds, but also commits to finding more accurate correspondences in overlapping areas. Our extensive experiments demonstrate the effectiveness of our method, achieving state-of-the-art registration recall (95.7\%/79.3\%) on the 3DMatch/3DLoMatch benchmark. Moreover, we also test the performance on ModelNet40 and demonstrate excellent performance.
This work introduces BEV-LIO(LC), a novel LiDAR-Inertial Odometry (LIO) framework that combines Bird's Eye View (BEV) image representations of LiDAR data with geometry-based point cloud registration and incorporates loop closure (LC) through BEV image features. By normalizing point density, we project LiDAR point clouds into BEV images, thereby enabling efficient feature extraction and matching. A lightweight convolutional neural network (CNN) based feature extractor is employed to extract distinctive local and global descriptors from the BEV images. Local descriptors are used to match BEV images with FAST keypoints for reprojection error construction, while global descriptors facilitate loop closure detection. Reprojection error minimization is then integrated with point-to-plane registration within an iterated Extended Kalman Filter (iEKF). In the back-end, global descriptors are used to create a KD-tree-indexed keyframe database for accurate loop closure detection. When a loop closure is detected, Random Sample Consensus (RANSAC) computes a coarse transform from BEV image matching, which serves as the initial estimate for Iterative Closest Point (ICP). The refined transform is subsequently incorporated into a factor graph along with odometry factors, improving the global consistency of localization. Extensive experiments conducted in various scenarios with different LiDAR types demonstrate that BEV-LIO(LC) outperforms state-of-the-art methods, achieving competitive localization accuracy. Our code, video and supplementary materials can be found at https://github.com/HxCa1/BEV-LIO-LC.
Iterative Closest Point (ICP) is a commonly used algorithm to estimate transformation between two point clouds. The key idea of this work is to leverage recent advances in explainable AI for probabilistic ICP methods that provide uncertainty estimates. Concretely, we propose a method that can explain why a probabilistic ICP method produced a particular output. Our method is based on kernel SHAP (SHapley Additive exPlanations). With this, we assign an importance value to common sources of uncertainty in ICP such as sensor noise, occlusion, and ambiguous environments. The results of the experiment show that this explanation method can reasonably explain the uncertainty sources, providing a step towards robots that know when and why they failed in a human interpretable manner
A high-fidelity digital simulation environment is crucial for accurately replicating physical operational processes. However, inconsistencies between simulation and physical environments result in low confidence in simulation outcomes, limiting their effectiveness in guiding real-world production. Unlike the traditional step-by-step point cloud "segmentation-registration" generation method, this paper introduces, for the first time, a novel Multi-Robot Manufacturing Digital Scene Generation (MRG) method that leverages multi-instance point cloud registration, specifically within manufacturing scenes. Tailored to the characteristics of industrial robots and manufacturing settings, an instance-focused transformer module is developed to delineate instance boundaries and capture correlations between local regions. Additionally, a hypothesis generation module is proposed to extract target instances while preserving key features. Finally, an efficient screening and optimization algorithm is designed to refine the final registration results. Experimental evaluations on the Scan2CAD and Welding-Station datasets demonstrate that: (1) the proposed method outperforms existing multi-instance point cloud registration techniques; (2) compared to state-of-the-art methods, the Scan2CAD dataset achieves improvements in MR and MP by 12.15% and 17.79%, respectively; and (3) on the Welding-Station dataset, MR and MP are enhanced by 16.95% and 24.15%, respectively. This work marks the first application of multi-instance point cloud registration in manufacturing scenes, significantly advancing the precision and reliability of digital simulation environments for industrial applications.




LiDAR sensors are essential for autonomous systems, yet LiDAR fiducial markers (LFMs) lag behind visual fiducial markers (VFMs) in adoption and utility. Bridging this gap is vital for robotics and computer vision but challenging due to the sparse, unstructured nature of 3D LiDAR data and 2D-focused fiducial marker designs. This dissertation proposes a novel framework for mapping and localization using LFMs is proposed to benefit a variety of real-world applications, including the collection of 3D assets and training data for point cloud registration, 3D map merging, Augmented Reality (AR), and many more. First, an Intensity Image-based LiDAR Fiducial Marker (IFM) system is introduced, using thin, letter-sized markers compatible with VFMs. A detection method locates 3D fiducials from intensity images, enabling LiDAR pose estimation. Second, an enhanced algorithm extends detection to 3D maps, increasing marker range and facilitating tasks like 3D map merging. This method leverages both intensity and geometry, overcoming limitations of geometry-only detection approaches. Third, a new LFM-based mapping and localization method registers unordered, low-overlap point clouds. It employs adaptive threshold detection and a two-level graph framework to solve a maximum a-posteriori (MAP) problem, optimizing point cloud and marker poses. Additionally, the Livox-3DMatch dataset is introduced, improving learning-based multiview point cloud registration methods. Extensive experiments with various LiDAR models in diverse indoor and outdoor scenes demonstrate the effectiveness and superiority of the proposed framework.
Staircase is one of the most common structures in artificial scenes. However, it is difficult for humanoid robots and people with lower limb disabilities or visual impairment to cross the scene without the help of sensors and intelligent algorithms. Staircase scene perception technology is a prerequisite for recognition and localization. This technology is of great significance for the mode switching of the robot and the calculation of the footprint position to adapt to the discontinuous terrain. However, there are still many problems that constrain the application of this technology, such as low recognition accuracy, high initial noise from sensors, unstable output signals and high computational requirements. In terms of scene reconstruction, the binocular and time of flight (TOF) reconstruction of the scene can be easily affected by environmental light and the surface material of the target object. In contrast, due to the special structure of the polarizer, the polarization can selectively transmit polarized light in a specific direction and this reconstruction method relies on the polarization information of the object surface. So the advantages of polarization reconstruction are reflected, which are less affected by environmental light and not dependent on the texture information of the object surface. In this paper, in order to achieve the detection of staircase, this paper proposes a contrast enhancement algorithm that integrates polarization and light intensity information, and integrates point cloud segmentation based on YOLOv11. To realize the high-quality reconstruction, we proposed a method of fusing polarized binocular and TOF depth information to realize the three-dimensional (3D) reconstruction of the staircase. Besides, it also proposes a joint calibration algorithm of monocular camera and TOF camera based on ICP registration and improved gray wolf optimization algorithm.




In autonomous systems, sensor calibration is essential for a safe and efficient navigation in dynamic environments. Accurate calibration is a prerequisite for reliable perception and planning tasks such as object detection and obstacle avoidance. Many existing LiDAR calibration methods require overlapping fields of view, while others use external sensing devices or postulate a feature-rich environment. In addition, Sensor-to-Vehicle calibration is not supported by the vast majority of calibration algorithms. In this work, we propose a novel target-based technique for extrinsic Sensor-to-Sensor and Sensor-to-Vehicle calibration of multi-LiDAR systems called CaLiV. This algorithm works for non-overlapping FoVs, as well as arbitrary calibration targets, and does not require any external sensing devices. First, we apply motion to produce FoV overlaps and utilize a simple unscented Kalman filter to obtain vehicle poses. Then, we use the Gaussian mixture model-based registration framework GMMCalib to align the point clouds in a common calibration frame. Finally, we reduce the task of recovering the sensor extrinsics to a minimization problem. We show that both translational and rotational Sensor-to-Sensor errors can be solved accurately by our method. In addition, all Sensor-to-Vehicle rotation angles can also be calibrated with high accuracy. We validate the simulation results in real-world experiments. The code is open source and available on https://github.com/TUMFTM/CaLiV.