Picture for Yunhai Tong

Yunhai Tong

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Add code
Oct 23, 2025
Viaarxiv icon

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Add code
Jun 09, 2025
Figure 1 for CyberV: Cybernetics for Test-time Scaling in Video Understanding
Figure 2 for CyberV: Cybernetics for Test-time Scaling in Video Understanding
Figure 3 for CyberV: Cybernetics for Test-time Scaling in Video Understanding
Figure 4 for CyberV: Cybernetics for Test-time Scaling in Video Understanding
Viaarxiv icon

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Add code
May 30, 2025
Figure 1 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 2 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 3 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Figure 4 for Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
Viaarxiv icon

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Add code
May 29, 2025
Viaarxiv icon

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Add code
May 22, 2025
Figure 1 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Figure 2 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Figure 3 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Figure 4 for Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Viaarxiv icon

MMaDA: Multimodal Large Diffusion Language Models

Add code
May 21, 2025
Viaarxiv icon

Generative Classifier for Domain Generalization

Add code
Apr 03, 2025
Viaarxiv icon

Training-free Diffusion Acceleration with Bottleneck Sampling

Add code
Mar 27, 2025
Viaarxiv icon

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

Add code
Mar 21, 2025
Viaarxiv icon