Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sikai Guo

Real-to-Sim for Highly Cluttered Environments via Physics-Consistent Inter-Object Reasoning

Feb 13, 2026

Tianyi Xiang, Jiahang Cao, Sikai Guo, Guoyang Zhao, Andrew F. Luo, Jun Ma

Abstract:Reconstructing physically valid 3D scenes from single-view observations is a prerequisite for bridging the gap between visual perception and robotic control. However, in scenarios requiring precise contact reasoning, such as robotic manipulation in highly cluttered environments, geometric fidelity alone is insufficient. Standard perception pipelines often neglect physical constraints, resulting in invalid states, e.g., floating objects or severe inter-penetration, rendering downstream simulation unreliable. To address these limitations, we propose a novel physics-constrained Real-to-Sim pipeline that reconstructs physically consistent 3D scenes from single-view RGB-D data. Central to our approach is a differentiable optimization pipeline that explicitly models spatial dependencies via a contact graph, jointly refining object poses and physical properties through differentiable rigid-body simulation. Extensive evaluations in both simulation and real-world settings demonstrate that our reconstructed scenes achieve high physical fidelity and faithfully replicate real-world contact dynamics, enabling stable and reliable contact-rich manipulation.

* Project page: https://physics-constrained-real2sim.github.io

Via

Access Paper or Ask Questions

RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Mar 03, 2025

Haichao Liu, Sikai Guo, Pengfei Mai, Jiahang Cao, Haoang Li, Jun Ma

Figure 1 for RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Figure 2 for RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Figure 3 for RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Figure 4 for RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

Abstract:This paper introduces RoboDexVLM, an innovative framework for robot task planning and grasp detection tailored for a collaborative manipulator equipped with a dexterous hand. Previous methods focus on simplified and limited manipulation tasks, which often neglect the complexities associated with grasping a diverse array of objects in a long-horizon manner. In contrast, our proposed framework utilizes a dexterous hand capable of grasping objects of varying shapes and sizes while executing tasks based on natural language commands. The proposed approach has the following core components: First, a robust task planner with a task-level recovery mechanism that leverages vision-language models (VLMs) is designed, which enables the system to interpret and execute open-vocabulary commands for long sequence tasks. Second, a language-guided dexterous grasp perception algorithm is presented based on robot kinematics and formal methods, tailored for zero-shot dexterous manipulation with diverse objects and commands. Comprehensive experimental results validate the effectiveness, adaptability, and robustness of RoboDexVLM in handling long-horizon scenarios and performing dexterous grasping. These results highlight the framework's ability to operate in complex environments, showcasing its potential for open-vocabulary dexterous manipulation. Our open-source project page can be found at https://henryhcliu.github.io/robodexvlm.

Via

Access Paper or Ask Questions

Dynamic Object Tracking for Quadruped Manipulator with Spherical Image-Based Approach

Jul 14, 2023

Tianlin Zhang, Sikai Guo, Xiaogang Xiong, Wanlei Li, Zezheng Qi, Yunjiang Lou

Abstract:Exactly estimating and tracking the motion of surrounding dynamic objects is one of important tasks for the autonomy of a quadruped manipulator. However, with only an onboard RGB camera, it is still a challenging work for a quadruped manipulator to track the motion of a dynamic object moving with unknown and changing velocities. To address this problem, this manuscript proposes a novel image-based visual servoing (IBVS) approach consisting of three elements: a spherical projection model, a robust super-twisting observer, and a model predictive controller (MPC). The spherical projection model decouples the visual error of the dynamic target into linear and angular ones. Then, with the presence of the visual error, the robustness of the observer is exploited to estimate the unknown and changing velocities of the dynamic target without depth estimation. Finally, the estimated velocity is fed into the model predictive controller (MPC) to generate joint torques for the quadruped manipulator to track the motion of the dynamical target. The proposed approach is validated through hardware experiments and the experimental results illustrate the approach's effectiveness in improving the autonomy of the quadruped manipulator.

Via

Access Paper or Ask Questions