Scene Understanding


Hand3R: Online 4D Hand-Scene Reconstruction in the Wild

Add code
Feb 03, 2026
Viaarxiv icon

JRDB-Pose3D: A Multi-person 3D Human Pose and Shape Estimation Dataset for Robotics

Add code
Feb 03, 2026
Viaarxiv icon

Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding

Add code
Feb 03, 2026
Viaarxiv icon

Relationship-Aware Hierarchical 3D Scene Graph for Task Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

Language Movement Primitives: Grounding Language Models in Robot Motion

Add code
Feb 02, 2026
Viaarxiv icon

RegionReasoner: Region-Grounded Multi-Round Visual Reasoning

Add code
Feb 03, 2026
Viaarxiv icon

AgenticLab: A Real-World Robot Agent Platform that Can See, Think, and Act

Add code
Feb 02, 2026
Viaarxiv icon

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Add code
Feb 02, 2026
Viaarxiv icon

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

Add code
Feb 02, 2026
Viaarxiv icon

GSR: Learning Structured Reasoning for Embodied Manipulation

Add code
Feb 02, 2026
Viaarxiv icon