Picture for Xiangtai Li

Xiangtai Li

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Add code
Dec 18, 2025
Viaarxiv icon

RecTok: Reconstruction Distillation along Rectified Flow

Add code
Dec 17, 2025
Viaarxiv icon

EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing

Add code
Dec 12, 2025
Viaarxiv icon

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Add code
Dec 11, 2025
Viaarxiv icon

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Add code
Nov 18, 2025
Viaarxiv icon

Visual Spatial Tuning

Add code
Nov 07, 2025
Viaarxiv icon

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

From Masks to Worlds: A Hitchhiker's Guide to World Models

Add code
Oct 23, 2025
Viaarxiv icon

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Add code
Oct 23, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon