Picture for Tongtong Cao

Tongtong Cao

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

Add code
May 10, 2026
Viaarxiv icon

Anticipation-VLA: Solving Long-Horizon Embodied Tasks via Anticipation-based Subgoal Generation

Add code
May 03, 2026
Viaarxiv icon

Efficient Camera Pose Augmentation for View Generalization in Robotic Policy Learning

Add code
Mar 31, 2026
Viaarxiv icon

Do World Action Models Generalize Better than VLAs? A Robustness Study

Add code
Mar 23, 2026
Viaarxiv icon

H-WM: Robotic Task and Motion Planning Guided by Hierarchical World Model

Add code
Feb 11, 2026
Viaarxiv icon

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

Add code
Feb 20, 2025
Figure 1 for Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation
Figure 2 for Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation
Figure 3 for Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation
Figure 4 for Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation
Viaarxiv icon

SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning

Add code
Jan 17, 2025
Figure 1 for SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning
Figure 2 for SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning
Figure 3 for SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning
Figure 4 for SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning
Viaarxiv icon

UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations

Add code
Nov 22, 2024
Figure 1 for UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
Figure 2 for UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
Figure 3 for UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
Figure 4 for UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
Viaarxiv icon

3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications

Add code
Oct 14, 2024
Figure 1 for 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications
Figure 2 for 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications
Figure 3 for 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications
Figure 4 for 3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications
Viaarxiv icon

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

Add code
Jul 02, 2024
Figure 1 for AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction
Figure 2 for AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction
Figure 3 for AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction
Figure 4 for AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction
Viaarxiv icon