Picture for Wanpeng Zhang

Wanpeng Zhang

RealDexUMI: A Wearable Universal Manipulation Interface for Dexterous Robot Learning

Add code
Jun 04, 2026
Viaarxiv icon

Unmasking the Illusion of Embodied Reasoning in Vision-Language-Action Models

Add code
Apr 20, 2026
Viaarxiv icon

Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Add code
Mar 17, 2026
Viaarxiv icon

Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild

Add code
Feb 25, 2026
Viaarxiv icon

Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization

Add code
Feb 10, 2026
Viaarxiv icon

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Add code
Jan 19, 2026
Viaarxiv icon

Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos

Add code
Dec 15, 2025
Viaarxiv icon

EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models

Add code
Feb 10, 2025
Viaarxiv icon

VideoOrion: Tokenizing Object Dynamics in Videos

Add code
Nov 25, 2024
Viaarxiv icon

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Add code
Oct 03, 2024
Figure 1 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 2 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 3 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Figure 4 for From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Viaarxiv icon