Picture for Yuzhong Zhao

Yuzhong Zhao

Thinking with Images via Self-Calling Agent

Add code
Dec 11, 2025
Viaarxiv icon

Geometric-Mean Policy Optimization

Add code
Jul 28, 2025
Viaarxiv icon

From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis

Add code
Apr 02, 2025
Figure 1 for From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis
Figure 2 for From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis
Figure 3 for From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis
Figure 4 for From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis
Viaarxiv icon

Model as a Game: On Numerical and Spatial Consistency for Generative Games

Add code
Mar 27, 2025
Viaarxiv icon

Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World

Add code
Dec 27, 2024
Figure 1 for Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Figure 2 for Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Figure 3 for Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Figure 4 for Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Viaarxiv icon

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Add code
Nov 28, 2024
Figure 1 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Figure 2 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Figure 3 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Figure 4 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Viaarxiv icon

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

Add code
Jul 01, 2024
Viaarxiv icon

DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

Add code
May 25, 2024
Figure 1 for DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
Figure 2 for DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
Figure 3 for DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
Figure 4 for DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
Viaarxiv icon

Controllable Dense Captioner with Multimodal Embedding Bridging

Add code
Feb 01, 2024
Viaarxiv icon

VMamba: Visual State Space Model

Add code
Jan 18, 2024
Figure 1 for VMamba: Visual State Space Model
Figure 2 for VMamba: Visual State Space Model
Figure 3 for VMamba: Visual State Space Model
Figure 4 for VMamba: Visual State Space Model
Viaarxiv icon