Picture for Xinggang Wang

Xinggang Wang

MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model

Add code
Jun 06, 2026
Viaarxiv icon

Food-R1: A Unified Multi-Task Food Vision-Language Model with Reinforcement Learning

Add code
Jun 03, 2026
Viaarxiv icon

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Add code
Apr 16, 2026
Viaarxiv icon

WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects

Add code
Apr 09, 2026
Viaarxiv icon

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

Add code
Apr 02, 2026
Viaarxiv icon

Mixture-of-Depths Attention

Add code
Mar 16, 2026
Viaarxiv icon

Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

Add code
Mar 11, 2026
Viaarxiv icon

OmniTrack: General Motion Tracking via Physics-Consistent Reference

Add code
Feb 27, 2026
Viaarxiv icon

Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning

Add code
Feb 24, 2026
Viaarxiv icon

TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation

Add code
Feb 09, 2026
Viaarxiv icon