Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rong Xiong

Semantics-aware Motion Retargeting with Vision-Language Models

Dec 04, 2023

Haodong Zhang, ZhiKe Chen, Haocheng Xu, Lei Hao, Xiaofei Wu, Songcen Xu, Zhensong Zhang, Yue Wang, Rong Xiong

Figure 1 for Semantics-aware Motion Retargeting with Vision-Language Models

Figure 2 for Semantics-aware Motion Retargeting with Vision-Language Models

Figure 3 for Semantics-aware Motion Retargeting with Vision-Language Models

Figure 4 for Semantics-aware Motion Retargeting with Vision-Language Models

Abstract:Capturing and preserving motion semantics is essential to motion retargeting between animation characters. However, most of the previous works neglect the semantic information or rely on human-designed joint-level representations. Here, we present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions. Then the high-level motion semantics are incorporated into the motion retargeting process by feeding the vision-language model with the rendered images and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics, we adopt a two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints. Experimental results show the effectiveness of the proposed method in producing high-quality motion retargeting results while accurately preserving motion semantics. Project page can be found at https://sites.google.com/view/smtnet.

Via

Access Paper or Ask Questions

NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

Nov 16, 2023

Yunxuan Mao, Xuan Yu, Kai Wang, Yue Wang, Rong Xiong, Yiyi Liao

Figure 1 for NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

Figure 2 for NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

Figure 3 for NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

Figure 4 for NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

Abstract:Neural implicit representations have emerged as a promising solution for providing dense geometry in Simultaneous Localization and Mapping (SLAM). However, existing methods in this direction fall short in terms of global consistency and low latency. This paper presents NGEL-SLAM to tackle the above challenges. To ensure global consistency, our system leverages a traditional feature-based tracking module that incorporates loop closure. Additionally, we maintain a global consistent map by representing the scene using multiple neural implicit fields, enabling quick adjustment to the loop closure. Moreover, our system allows for fast convergence through the use of octree-based implicit representations. The combination of rapid response to loop closure and fast convergence makes our system a truly low-latency system that achieves global consistency. Our system enables rendering high-fidelity RGB-D images, along with extracting dense and complete surfaces. Experiments on both synthetic and real-world datasets suggest that our system achieves state-of-the-art tracking and mapping accuracy while maintaining low latency.

* 7 pages, 8 figures, 2024 ICRA under review

Via

Access Paper or Ask Questions

DORec: Decomposed Object Reconstruction Utilizing 2D Self-Supervised Features

Oct 19, 2023

Jun Wu, Sicheng Li, Sihui Ji, Yue Wang, Rong Xiong, Yiyi Liao

Abstract:Decomposing a target object from a complex background while reconstructing is challenging. Most approaches acquire the perception for object instances through the use of manual labels, but the annotation procedure is costly. The recent advancements in 2D self-supervised learning have brought new prospects to object-aware representation, yet it remains unclear how to leverage such noisy 2D features for clean decomposition. In this paper, we propose a Decomposed Object Reconstruction (DORec) network based on neural implicit representations. Our key idea is to transfer 2D self-supervised features into masks of two levels of granularity to supervise the decomposition, including a binary mask to indicate the foreground regions and a K-cluster mask to indicate the semantically similar regions. These two masks are complementary to each other and lead to robust decomposition. Experimental results show the superiority of DORec in segmenting and reconstructing the foreground object on various datasets.

Via

Access Paper or Ask Questions

A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Oct 05, 2023

Jintao Xue, Dongkun Zhang, Rong Xiong, Yue Wang, Eryun Liu

Figure 1 for A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Figure 2 for A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Figure 3 for A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Figure 4 for A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Abstract:Multi-Agent Reinforcement Learning (MARL) has become a promising solution for constructing a multi-agent autonomous driving system (MADS) in complex and dense scenarios. But most methods consider agents acting selfishly, which leads to conflict behaviors. Some existing works incorporate the concept of social value orientation (SVO) to promote coordination, but they lack the knowledge of other agents' SVOs, resulting in conservative maneuvers. In this paper, we aim to tackle the mentioned problem by enabling the agents to understand other agents' SVOs. To accomplish this, we propose a two-stage system framework. Firstly, we train a policy by allowing the agents to share their ground truth SVOs to establish a coordinated traffic flow. Secondly, we develop a recognition network that estimates agents' SVOs and integrates it with the policy trained in the first stage. Experiments demonstrate that our developed method significantly improves the performance of the driving policy in MADS compared to two state-of-the-art MARL algorithms.

Via

Access Paper or Ask Questions

CNS: Correspondence Encoded Neural Image Servo Policy

Sep 16, 2023

Anzhe Chen, Hongxiang Yu, Yue Wang, Rong Xiong

Figure 1 for CNS: Correspondence Encoded Neural Image Servo Policy

Figure 2 for CNS: Correspondence Encoded Neural Image Servo Policy

Figure 3 for CNS: Correspondence Encoded Neural Image Servo Policy

Figure 4 for CNS: Correspondence Encoded Neural Image Servo Policy

Abstract:Image servo is an indispensable technique in robotic applications that helps to achieve high precision positioning. The intermediate representation of image servo policy is important to sensor input abstraction and policy output guidance. Classical approaches achieve high precision but require clean keypoint correspondence, and suffer from limited convergence basin or weak feature error robustness. Recent learning-based methods achieve moderate precision and large convergence basin on specific scenes but face issues when generalizing to novel environments. In this paper, we encode keypoints and correspondence into a graph and use graph neural network as architecture of controller. This design utilizes both advantages: generalizable intermediate representation from keypoint correspondence and strong modeling ability from neural network. Other techniques including realistic data generation, feature clustering and distance decoupling are proposed to further improve efficiency, precision and generalization. Experiments in simulation and real-world verify the effectiveness of our method in speed (maximum 40fps along with observer), precision (<0.3{\deg} and sub-millimeter accuracy) and generalization (sim-to-real without fine-tuning). Project homepage (full paper with supplementary text, video and code): https://hhcaz.github.io/CNS-home

Via

Access Paper or Ask Questions

3D Model-free Visual Localization System from Essential Matrix under Local Planar Motion

Sep 04, 2023

Yanmei Jiao, Binxin Zhang, Peng Jiang, Chaoqun Wang, Rong Xiong, Yue Wang

Abstract:Visual localization plays a critical role in the functionality of low-cost autonomous mobile robots. Current state-of-the-art approaches for achieving accurate visual localization are 3D scene-specific, requiring additional computational and storage resources to construct a 3D scene model when facing a new environment. An alternative approach of directly using a database of 2D images for visual localization offers more flexibility. However, such methods currently suffer from limited localization accuracy. In this paper, we propose an accurate and robust multiple checking-based 3D model-free visual localization system to address the aforementioned issues. To ensure high accuracy, our focus is on estimating the pose of a query image relative to the retrieved database images using 2D-2D feature matches. Theoretically, by incorporating the local planar motion constraint into both the estimation of the essential matrix and the triangulation stages, we reduce the minimum required feature matches for absolute pose estimation, thereby enhancing the robustness of outlier rejection. Additionally, we introduce a multiple-checking mechanism to ensure the correctness of the solution throughout the solving process. For validation, qualitative and quantitative experiments are performed on both simulation and two real-world datasets and the experimental results demonstrate a significant enhancement in both accuracy and robustness afforded by the proposed 3D model-free visual localization system.

Via

Access Paper or Ask Questions

Sparse Waypoint Validity Checking for Self-Entanglement-Free Tethered Path Planning

Aug 30, 2023

Tong Yang, Jiangpin Liu, Yue Wang, Rong Xiong

Abstract:A novel mechanism to derive self-entanglement-free (SEF) path for tethered differential-driven robots is proposed in this work. The problem is tailored to the deployment of tethered differential-driven robots in situations where an omni-directional tether re-tractor is not available. This is frequently encountered when it is impractical to concurrently equip an omni-directional tether retracting mechanism with other geometrically intricate devices, such as a manipulator, which is notably relevant in applications like disaster recovery, spatial exploration, etc. Without specific attention to the spatial relation between the shape of the tether and the pose of the mobile unit, the issue of self-entanglement arises when the robot moves, resulting in unsafe robot movements and the risk of damaging the tether. In this paper, the SEF constraint is first formulated as the boundedness of a relative angle function which characterises the angular difference between the tether stretching direction and the robot's heading direction. Then, a constrained searching-based path planning algorithm is proposed which produces a path that is sub-optimal whilst ensuring the avoidance of tether self-entanglement. Finally, the algorithmic efficiency of the proposed path planner is further enhanced by proving the conditioned sparsity of the primitive path validity checking module. The effectiveness of the proposed algorithm is assessed through case studies, comparing its performance against untethered differential-driven planners in challenging planning scenarios. A comparative analysis is further conducted between the normal node expansion module and the improved node expansion module which incorporates sparse waypoint validity checking. Real-world tests are also conducted to validate the algorithm's performance. An open-source implementation has also made available for the benefit of the robotics community.

* This is a generalised version of the authors' ICRA23 conference paper

Via

Access Paper or Ask Questions

Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Aug 17, 2023

Yuhao Yang, Jun Wu, Guangjian Zhang, Rong Xiong

Figure 1 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Figure 2 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Figure 3 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Figure 4 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Abstract:Traditional geometric registration based estimation methods only exploit the CAD model implicitly, which leads to their dependence on observation quality and deficiency to occlusion. To address the problem,the paper proposes a bidirectional correspondence prediction network with a point-wise attention-aware mechanism. This network not only requires the model points to predict the correspondence but also explicitly models the geometric similarities between observations and the model prior. Our key insight is that the correlations between each model point and scene point provide essential information for learning point-pair matches. To further tackle the correlation noises brought by feature distribution divergence, we design a simple but effective pseudo-siamese network to improve feature homogeneity. Experimental results on the public datasets of LineMOD, YCB-Video, and Occ-LineMOD show that the proposed method achieves better performance than other state-of-the-art methods under the same evaluation criteria. Its robustness in estimating poses is greatly improved, especially in an environment with severe occlusions.

Via

Access Paper or Ask Questions

Leveraging BEV Representation for 360-degree Visual Place Recognition

May 23, 2023

Xuecheng Xu, Yanmei Jiao, Sha Lu, Xiaqing Ding, Rong Xiong, Yue Wang

Figure 1 for Leveraging BEV Representation for 360-degree Visual Place Recognition

Figure 2 for Leveraging BEV Representation for 360-degree Visual Place Recognition

Figure 3 for Leveraging BEV Representation for 360-degree Visual Place Recognition

Figure 4 for Leveraging BEV Representation for 360-degree Visual Place Recognition

Abstract:This paper investigates the advantages of using Bird's Eye View (BEV) representation in 360-degree visual place recognition (VPR). We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion, which bridges visual cues and spatial awareness. Our method extracts image features using standard convolutional networks and combines the features according to pre-defined 3D grid spatial points. To alleviate the mechanical and time misalignments between cameras, we further introduce deformable attention to learn the compensation. Upon the BEV feature representation, we then employ the polar transform and the Discrete Fourier transform for aggregation, which is shown to be rotation-invariant. In addition, the image and point cloud cues can be easily stated in the same coordinates, which benefits sensor fusion for place recognition. The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets, including on-the-road and off-the-road scenarios. The experimental results verify the hypothesis that BEV can benefit VPR by its superior performance compared to baseline methods. To the best of our knowledge, this is the first trial of employing BEV representation in this task.

Via

Access Paper or Ask Questions

An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

May 02, 2023

Ke Qiu, Jingyu Zhang, Danying Sun, Rong Xiong, Haojian Lu, Yue Wang

Figure 1 for An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

Figure 2 for An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

Figure 3 for An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

Figure 4 for An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

Abstract:Piecewise constant curvature is a popular kinematics framework for continuum robots. Computing the model parameters from the desired end pose, known as the inverse kinematics problem, is fundamental in manipulation, tracking and planning tasks. In this paper, we propose an efficient multi-solution solver to address the inverse kinematics problem of 3-section constant-curvature robots by bridging both the theoretical reduction and numerical correction. We derive analytical conditions to simplify the original problem into a one-dimensional problem. Further, the equivalence of the two problems is formalised. In addition, we introduce an approximation with bounded error so that the one dimension becomes traversable while the remaining parameters analytically solvable. With the theoretical results, the global search and numerical correction are employed to implement the solver. The experiments validate the better efficiency and higher success rate of our solver than the numerical methods when one solution is required, and demonstrate the ability of obtaining multiple solutions with optimal path planning in a space with obstacles.

* Robotics: Science and Systems 2023

Via

Access Paper or Ask Questions