Picture for Mike Zheng Shou

Mike Zheng Shou

Reinforcement Learning in Vision: A Survey

Add code
Aug 11, 2025
Viaarxiv icon

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Add code
Jul 23, 2025
Viaarxiv icon

Show-o2: Improved Native Unified Multimodal Models

Add code
Jun 18, 2025
Viaarxiv icon

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

Add code
Jun 05, 2025
Viaarxiv icon

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

D-AR: Diffusion via Autoregressive Models

Add code
May 29, 2025
Viaarxiv icon

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Add code
May 24, 2025
Viaarxiv icon

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Add code
May 22, 2025
Viaarxiv icon

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Viaarxiv icon

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Add code
Apr 22, 2025
Viaarxiv icon