Picture for Mike Zheng Shou

Mike Zheng Shou

World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy

Add code
Feb 06, 2026
Viaarxiv icon

ShowUI-Aloha: Human-Taught GUI Agent

Add code
Jan 12, 2026
Viaarxiv icon

FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection

Add code
Jan 07, 2026
Viaarxiv icon

ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands

Add code
Dec 31, 2025
Viaarxiv icon

Factorized Learning for Temporally Grounded Video-Language Models

Add code
Dec 30, 2025
Viaarxiv icon

Mitty: Diffusion-based Human-to-Robot Video Generation

Add code
Dec 19, 2025
Viaarxiv icon

EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models

Add code
Dec 16, 2025
Viaarxiv icon

H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos

Add code
Dec 10, 2025
Figure 1 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Figure 2 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Figure 3 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Figure 4 for H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Viaarxiv icon

OmniPSD: Layered PSD Generation with Diffusion Transformer

Add code
Dec 10, 2025
Viaarxiv icon

Computer-Use Agents as Judges for Generative User Interface

Add code
Nov 19, 2025
Viaarxiv icon