Picture for Shanghang Zhang

Shanghang Zhang

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Add code
Jun 12, 2025
Viaarxiv icon

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

Add code
Jun 12, 2025
Viaarxiv icon

SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

Add code
Jun 07, 2025
Viaarxiv icon

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Add code
Jun 04, 2025
Viaarxiv icon

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

Add code
May 29, 2025
Viaarxiv icon

OmniIndoor3D: Comprehensive Indoor 3D Reconstruction

Add code
May 27, 2025
Viaarxiv icon

SpikeGen: Generative Framework for Visual Spike Stream Processing

Add code
May 23, 2025
Viaarxiv icon

ACU: Analytic Continual Unlearning for Efficient and Exact Forgetting with Privacy Preservation

Add code
May 18, 2025
Viaarxiv icon

AFCL: Analytic Federated Continual Learning for Spatio-Temporal Invariance of Non-IID Data

Add code
May 18, 2025
Viaarxiv icon

H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos

Add code
May 17, 2025
Viaarxiv icon