Picture for Zhao Zhong

Zhao Zhong

Baton: Explicit Semantic Blueprints for Joint Video-Audio Generation

Add code
May 24, 2026
Viaarxiv icon

Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

Add code
May 22, 2026
Viaarxiv icon

DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning

Add code
May 06, 2026
Viaarxiv icon

Symbiotic-MoE: Unlocking the Synergy between Generation and Understanding

Add code
Apr 09, 2026
Viaarxiv icon

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

Add code
Apr 01, 2026
Viaarxiv icon

OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

Add code
Mar 25, 2026
Viaarxiv icon

Manifold-Aware Exploration for Reinforcement Learning in Video Generation

Add code
Mar 23, 2026
Viaarxiv icon

HYDRA: Unifying Multi-modal Generation and Understanding via Representation-Harmonized Tokenization

Add code
Mar 17, 2026
Viaarxiv icon

UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations

Add code
Mar 11, 2026
Viaarxiv icon

DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching

Add code
Feb 05, 2026
Viaarxiv icon