Picture for Zhuofan Zong

Zhuofan Zong

SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics

Add code
Jan 14, 2026
Viaarxiv icon

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Add code
Jan 04, 2026
Viaarxiv icon

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Add code
Sep 26, 2025
Viaarxiv icon

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Add code
May 01, 2025
Viaarxiv icon

ADT: Tuning Diffusion Models with Adversarial Supervision

Add code
Apr 15, 2025
Figure 1 for ADT: Tuning Diffusion Models with Adversarial Supervision
Figure 2 for ADT: Tuning Diffusion Models with Adversarial Supervision
Figure 3 for ADT: Tuning Diffusion Models with Adversarial Supervision
Figure 4 for ADT: Tuning Diffusion Models with Adversarial Supervision
Viaarxiv icon

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Add code
Dec 15, 2024
Figure 1 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 2 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 3 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 4 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Viaarxiv icon

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Figure 1 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 2 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 3 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 4 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Viaarxiv icon

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Add code
Jun 17, 2024
Viaarxiv icon

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Add code
Apr 19, 2024
Figure 1 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 2 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 3 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Figure 4 for MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Viaarxiv icon

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Add code
Apr 04, 2024
Figure 1 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 2 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 3 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Figure 4 for CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Viaarxiv icon