Picture for Shanghang Zhang

Shanghang Zhang

RoboBrain 2.0 Technical Report

Add code
Jul 02, 2025
Viaarxiv icon

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Add code
Jul 02, 2025
Viaarxiv icon

MinD: Unified Visual Imagination and Control via Hierarchical World Models

Add code
Jun 23, 2025
Viaarxiv icon

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Add code
Jun 12, 2025
Viaarxiv icon

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

Add code
Jun 12, 2025
Viaarxiv icon

SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

Add code
Jun 07, 2025
Viaarxiv icon

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Add code
Jun 04, 2025
Viaarxiv icon

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

Add code
May 29, 2025
Viaarxiv icon

OmniIndoor3D: Comprehensive Indoor 3D Reconstruction

Add code
May 27, 2025
Viaarxiv icon

SpikeGen: Generative Framework for Visual Spike Stream Processing

Add code
May 23, 2025
Viaarxiv icon