Picture for Yunhai Tong

Yunhai Tong

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Add code
Jun 09, 2025
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Viaarxiv icon

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Add code
May 29, 2025
Viaarxiv icon

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Add code
May 22, 2025
Viaarxiv icon

MMaDA: Multimodal Large Diffusion Language Models

Add code
May 21, 2025
Viaarxiv icon

Generative Classifier for Domain Generalization

Add code
Apr 03, 2025
Viaarxiv icon

Training-free Diffusion Acceleration with Bottleneck Sampling

Add code
Mar 27, 2025
Viaarxiv icon

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

Add code
Mar 21, 2025
Viaarxiv icon

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

Add code
Feb 17, 2025
Figure 1 for Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Figure 2 for Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Figure 3 for Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Figure 4 for Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Viaarxiv icon

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Add code
Jan 08, 2025
Viaarxiv icon