Picture for Yunhai Tong

Yunhai Tong

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Add code
Jun 09, 2025
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Viaarxiv icon

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Add code
May 29, 2025
Viaarxiv icon

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Add code
May 22, 2025
Viaarxiv icon

MMaDA: Multimodal Large Diffusion Language Models

Add code
May 21, 2025
Viaarxiv icon

Generative Classifier for Domain Generalization

Add code
Apr 03, 2025
Viaarxiv icon

Training-free Diffusion Acceleration with Bottleneck Sampling

Add code
Mar 27, 2025
Viaarxiv icon

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

Add code
Mar 21, 2025
Viaarxiv icon

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Add code
Feb 17, 2025
Viaarxiv icon

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Add code
Jan 08, 2025
Viaarxiv icon