Picture for Mike Zheng Shou

Mike Zheng Shou

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

Add code
Jun 05, 2025
Viaarxiv icon

D-AR: Diffusion via Autoregressive Models

Add code
May 29, 2025
Viaarxiv icon

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Add code
May 29, 2025
Viaarxiv icon

OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Add code
May 24, 2025
Viaarxiv icon

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Add code
May 22, 2025
Viaarxiv icon

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Viaarxiv icon

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Add code
Apr 22, 2025
Viaarxiv icon

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation

Add code
Apr 20, 2025
Viaarxiv icon

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Add code
Apr 08, 2025
Viaarxiv icon

AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis

Add code
Mar 27, 2025
Viaarxiv icon