Picture for Yunhai Tong

Yunhai Tong

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Add code
Jun 17, 2026
Viaarxiv icon

Watch, Remember, Reason: Human-View Video Understanding with MLLMs

Add code
Jun 05, 2026
Viaarxiv icon

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Add code
Jun 04, 2026
Viaarxiv icon

One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration

Add code
May 20, 2026
Viaarxiv icon

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

Add code
Apr 02, 2026
Viaarxiv icon

Rethinking Vector Field Learning for Generative Segmentation

Add code
Mar 19, 2026
Viaarxiv icon

RecTok: Reconstruction Distillation along Rectified Flow

Add code
Dec 17, 2025
Viaarxiv icon

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Add code
Nov 18, 2025
Viaarxiv icon

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Add code
Oct 23, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon