Picture for Songen Gu

Songen Gu

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

Add code
Apr 23, 2026
Viaarxiv icon

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Add code
Apr 22, 2026
Viaarxiv icon

OmniVTA: Visuo-Tactile World Modeling for Contact-Rich Robotic Manipulation

Add code
Mar 19, 2026
Viaarxiv icon

Say, Dream, and Act: Learning Video World Models for Instruction-Driven Robot Manipulation

Add code
Feb 11, 2026
Viaarxiv icon

World In Your Hands: A Large-Scale and Open-source Ecosystem for Learning Human-centric Manipulation in the Wild

Add code
Dec 30, 2025
Viaarxiv icon

OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction

Add code
Sep 04, 2025
Viaarxiv icon

MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation without Vector Quantization

Add code
Mar 18, 2025
Viaarxiv icon

VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting

Add code
Mar 16, 2025
Viaarxiv icon

DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model

Add code
Oct 14, 2024
Figure 1 for DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
Figure 2 for DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
Figure 3 for DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
Figure 4 for DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
Viaarxiv icon

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Add code
Oct 07, 2024
Viaarxiv icon