Picture for Zhide Zhong

Zhide Zhong

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models

Add code
Mar 23, 2026
Viaarxiv icon

DyGeoVLN: Infusing Dynamic Geometry Foundation Model into Vision-Language Navigation

Add code
Mar 22, 2026
Viaarxiv icon

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Add code
Mar 17, 2026
Viaarxiv icon

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

Add code
Feb 26, 2026
Viaarxiv icon

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Add code
Dec 28, 2025
Viaarxiv icon

FlowVLA: Thinking in Motion with a Visual Chain of Thought

Add code
Aug 25, 2025
Viaarxiv icon

Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding

Add code
Mar 04, 2025
Viaarxiv icon

ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation

Add code
Nov 10, 2023
Figure 1 for ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation
Figure 2 for ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation
Figure 3 for ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation
Figure 4 for ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation
Viaarxiv icon

MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving

Add code
Jul 27, 2023
Viaarxiv icon