Picture for Shanghang Zhang

Shanghang Zhang

4D Visual Pre-training for Robot Learning

Add code
Aug 24, 2025
Viaarxiv icon

HumanoidVerse: A Versatile Humanoid for Vision-Language Guided Multi-Object Rearrangement

Add code
Aug 23, 2025
Viaarxiv icon

$NavA^3$: Understanding Any Instruction, Navigating Anywhere, Finding Anything

Add code
Aug 06, 2025
Viaarxiv icon

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

Add code
Jul 02, 2025
Viaarxiv icon

RoboBrain 2.0 Technical Report

Add code
Jul 02, 2025
Viaarxiv icon

MinD: Unified Visual Imagination and Control via Hierarchical World Models

Add code
Jun 23, 2025
Viaarxiv icon

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Add code
Jun 12, 2025
Viaarxiv icon

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

Add code
Jun 12, 2025
Viaarxiv icon

SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

Add code
Jun 07, 2025
Viaarxiv icon

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Add code
Jun 04, 2025
Viaarxiv icon