Alert button
Picture for Shenghai Yuan

Shenghai Yuan

Alert button

MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation

Sep 21, 2023
Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie

Figure 1 for MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation
Figure 2 for MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation
Figure 3 for MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation
Figure 4 for MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation

Multi-modal unsupervised domain adaptation (MM-UDA) for 3D semantic segmentation is a practical solution to embed semantic understanding in autonomous systems without expensive point-wise annotations. While previous MM-UDA methods can achieve overall improvement, they suffer from significant class-imbalanced performance, restricting their adoption in real applications. This imbalanced performance is mainly caused by: 1) self-training with imbalanced data and 2) the lack of pixel-wise 2D supervision signals. In this work, we propose Multi-modal Prior Aided (MoPA) domain adaptation to improve the performance of rare objects. Specifically, we develop Valid Ground-based Insertion (VGI) to rectify the imbalance supervision signals by inserting prior rare objects collected from the wild while avoiding introducing artificial artifacts that lead to trivial solutions. Meanwhile, our SAM consistency loss leverages the 2D prior semantic masks from SAM as pixel-wise supervision signals to encourage consistent predictions for each object in the semantic mask. The knowledge learned from modal-specific prior is then shared across modalities to achieve better rare object segmentation. Extensive experiments show that our method achieves state-of-the-art performance on the challenging MM-UDA benchmark. Code will be available at https://github.com/AronCao49/MoPA.

Viaarxiv icon

Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning

Sep 16, 2023
Pengyu Yin, Haozhi Cao, Thien-Minh Nguyen, Shenghai Yuan, Shuyang Zhang, Kangcheng Liu, Lihua Xie

Figure 1 for Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning
Figure 2 for Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning
Figure 3 for Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning
Figure 4 for Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning

One-shot LiDAR localization refers to the ability to estimate the robot pose from one single point cloud, which yields significant advantages in initialization and relocalization processes. In the point cloud domain, the topic has been extensively studied as a global descriptor retrieval (i.e., loop closure detection) and pose refinement (i.e., point cloud registration) problem both in isolation or combined. However, few have explicitly considered the relationship between candidate retrieval and correspondence generation in pose estimation, leaving them brittle to substructure ambiguities. To this end, we propose a hierarchical one-shot localization algorithm called Outram that leverages substructures of 3D scene graphs for locally consistent correspondence searching and global substructure-wise outlier pruning. Such a hierarchical process couples the feature retrieval and the correspondence extraction to resolve the substructure ambiguities by conducting a local-to-global consistency refinement. We demonstrate the capability of Outram in a variety of scenarios in multiple large-scale outdoor datasets. Our implementation is open-sourced: https://github.com/Pamphlett/Outram.

* 8 pages, 5 figures 
Viaarxiv icon

LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map

Jun 30, 2023
Xingyu Ji, Shenghai Yuan, Pengyu Yin, Lihua Xie

Figure 1 for LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map
Figure 2 for LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map
Figure 3 for LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map
Figure 4 for LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map

This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distributions, a new residual metric is proposed for the filter-based Lidar inertial odometry, which demonstrates an improvement from merely quantifying distance to incorporating variance disparity, further enriching the comprehensiveness and accuracy of the residual metric. Due to the strategic design of the residual metric, we propose a simple yet effective voxel-solely mapping scheme, which only necessities the maintenance of one centroid and one covariance matrix for each voxel. Experiments on different datasets demonstrate the robustness and accuracy of our framework for various data inputs and environments. To the benefit of the robotics society, we open source the code at https://github.com/Ji1Xingyu/lio_gvm.

Viaarxiv icon

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

May 12, 2023
Jianfei Yang, He Huang, Yunjiao Zhou, Xinyan Chen, Yuecong Xu, Shenghai Yuan, Han Zou, Chris Xiaoxuan Lu, Lihua Xie

Figure 1 for MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing
Figure 2 for MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing
Figure 3 for MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing
Figure 4 for MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

4D human perception plays an essential role in a myriad of applications, such as home automation and metaverse avatar simulation. However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for device-free human sensing. In this paper, we propose MM-Fi, the first multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation action categories, to bridge the gap between wireless sensing and high-level human perception tasks. MM-Fi consists of over 320k synchronized frames of five modalities from 40 human subjects. Various annotations are provided to support potential sensing tasks, e.g., human pose estimation and action recognition. Extensive experiments have been conducted to compare the sensing capacity of each or several modalities in terms of multiple tasks. We envision that MM-Fi can contribute to wireless sensing research with respect to action recognition, human pose estimation, multi-modal learning, cross-modal supervision, and interdisciplinary healthcare research.

* Project page: https://ntu-aiot-lab.github.io/mm-fi 
Viaarxiv icon

Path Planning for Multiple Tethered Robots Using Topological Braids

Apr 29, 2023
Muqing Cao, Kun Cao, Shenghai Yuan, Kangcheng Liu, Yan Loi Wong, Lihua Xie

Figure 1 for Path Planning for Multiple Tethered Robots Using Topological Braids
Figure 2 for Path Planning for Multiple Tethered Robots Using Topological Braids
Figure 3 for Path Planning for Multiple Tethered Robots Using Topological Braids
Figure 4 for Path Planning for Multiple Tethered Robots Using Topological Braids

Path planning for multiple tethered robots is a challenging problem due to the complex interactions among the cables and the possibility of severe entanglements. Previous works on this problem either consider idealistic cable models or provide no guarantee for entanglement-free paths. In this work, we present a new approach to address this problem using the theory of braids. By establishing a topological equivalence between the physical cables and the space-time trajectories of the robots, and identifying particular braid patterns that emerge from the entangled trajectories, we obtain the key finding that all complex entanglements stem from a finite number of interaction patterns between 2 or 3 robots. Hence, non-entanglement can be guaranteed by avoiding these interaction patterns in the trajectories of the robots. Based on this finding, we present a graph search algorithm using the permutation grid to efficiently search for a feasible topology of paths and reject braid patterns that result in an entanglement. We demonstrate that the proposed algorithm can achieve 100% goal-reaching capability without entanglement for up to 10 drones with a slack cable model in a high-fidelity simulation platform. The practicality of the proposed approach is verified using three small tethered UAVs in indoor flight experiments.

* Accepted for presentation in Robotics: Science and Systems 2023 
Viaarxiv icon

DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels

Mar 20, 2023
Muqing Cao, Xinhang Xu, Shenghai Yuan, Kun Cao, Kangcheng Liu, Lihua Xie

Figure 1 for DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels
Figure 2 for DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels
Figure 3 for DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels
Figure 4 for DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels

We present the dynamic model and control of DoubleBee, a novel hybrid aerial-ground vehicle consisting of two propellers mounted on tilting servo motors and two motor-driven wheels. DoubleBee exploits the high energy efficiency of a bicopter configuration in aerial mode, and enjoys the low power consumption of a two-wheel self-balancing robot on the ground. Furthermore, the propeller thrusts act as additional control inputs on the ground, enabling a novel decoupled control scheme where the attitude of the robot is controlled using thrusts and the translational motion is realized using wheels. A prototype of DoubleBee is constructed using commercially available components. The power efficiency and the control performance of the robot are verified through comprehensive experiments. Challenging tasks in indoor and outdoor environments demonstrate the capability of DoubleBee to traverse unstructured environments, fly over and move under barriers, and climb steep and rough terrains.

Viaarxiv icon

VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors

Mar 20, 2023
Thien Hoang Nguyen, Shenghai Yuan, Lihua Xie

Figure 1 for VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors
Figure 2 for VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors
Figure 3 for VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors
Figure 4 for VR-SLAM: A Visual-Range Simultaneous Localization and Mapping System using Monocular Camera and Ultra-wideband Sensors

In this work, we propose a simultaneous localization and mapping (SLAM) system using a monocular camera and Ultra-wideband (UWB) sensors. Our system, referred to as VRSLAM, is a multi-stage framework that leverages the strengths and compensates for the weaknesses of each sensor. Firstly, we introduce a UWB-aided 7 degree-of-freedom (scale factor, 3D position, and 3D orientation) global alignment module to initialize the visual odometry (VO) system in the world frame defined by the UWB anchors. This module loosely fuses up-to-scale VO and ranging data using either a quadratically constrained quadratic programming (QCQP) or nonlinear least squares (NLS) algorithm based on whether a good initial guess is available. Secondly, we provide an accompanied theoretical analysis that includes the derivation and interpretation of the Fisher Information Matrix (FIM) and its determinant. Thirdly, we present UWBaided bundle adjustment (UBA) and UWB-aided pose graph optimization (UPGO) modules to improve short-term odometry accuracy, reduce long-term drift as well as correct any alignment and scale errors. Extensive simulations and experiments show that our solution outperforms UWB/camera-only and previous approaches, can quickly recover from tracking failure without relying on visual relocalization, and can effortlessly obtain a global map even if there are no loop closures.

* 13 pages 
Viaarxiv icon

Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

Mar 18, 2023
Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie

Figure 1 for Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation
Figure 2 for Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation
Figure 3 for Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation
Figure 4 for Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

Continual Test-Time Adaptation (CTTA) generalizes conventional Test-Time Adaptation (TTA) by assuming that the target domain is dynamic over time rather than stationary. In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation. The key to MM-CTTA is to adaptively attend to the reliable modality while avoiding catastrophic forgetting during continual domain shifts, which is out of the capability of previous TTA or CTTA methods. To fulfill this gap, we propose an MM-CTTA method called Continual Cross-Modal Adaptive Clustering (CoMAC) that addresses this task from two perspectives. On one hand, we propose an adaptive dual-stage mechanism to generate reliable cross-modal predictions by attending to the reliable modality based on the class-wise feature-centroid distance in the latent space. On the other hand, to perform test-time adaptation without catastrophic forgetting, we design class-wise momentum queues that capture confident target features for adaptation while stochastically restoring pseudo-source features to revisit source knowledge. We further introduce two new benchmarks to facilitate the exploration of MM-CTTA in the future. Our experimental results show that our method achieves state-of-the-art performance on both benchmarks.

* 15 pages, 6 tables, 7 figures 
Viaarxiv icon

Segregator: Global Point Cloud Registration with Semantic and Geometric Cues

Jan 18, 2023
Pengyu Yin, Shenghai Yuan, Haozhi Cao, Xingyu Ji, Shuyang Zhang, Lihua Xie

Figure 1 for Segregator: Global Point Cloud Registration with Semantic and Geometric Cues
Figure 2 for Segregator: Global Point Cloud Registration with Semantic and Geometric Cues
Figure 3 for Segregator: Global Point Cloud Registration with Semantic and Geometric Cues
Figure 4 for Segregator: Global Point Cloud Registration with Semantic and Geometric Cues

This paper presents Segregator, a global point cloud registration framework that exploits both semantic information and geometric distribution to efficiently build up outlier-robust correspondences and search for inliers. Current state-of-the-art algorithms rely on point features to set up putative correspondences and refine them by employing pair-wise distance consistency checks. However, such a scheme suffers from degenerate cases, where the descriptive capability of local point features downgrades, and unconstrained cases, where length-preserving (l-TRIMs)-based checks cannot sufficiently constrain whether the current observation is consistent with others, resulting in a complexified NP-complete problem to solve. To tackle these problems, on the one hand, we propose a novel degeneracy-robust and efficient corresponding procedure consisting of both instance-level semantic clusters and geometric-level point features. On the other hand, Gaussian distribution-based translation and rotation invariant measurements (G-TRIMs) are proposed to conduct the consistency check and further constrain the problem size. We validated our proposed algorithm on extensive real-world data-based experiments. The code is available: https://github.com/Pamphlett/Segregator.

* 6 pages, 5 figures. Accepted to ICRA2023 
Viaarxiv icon