Picture for Zhiyu Tan

Zhiyu Tan

DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers

Add code
Mar 04, 2026
Viaarxiv icon

Diff-Aid: Inference-time Adaptive Interaction Denoising for Rectified Text-to-Image Generation

Add code
Feb 14, 2026
Viaarxiv icon

Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing

Add code
Feb 09, 2026
Viaarxiv icon

Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion

Add code
Jan 05, 2026
Viaarxiv icon

A unified multimodal understanding and generation model for cross-disciplinary scientific research

Add code
Jan 04, 2026
Viaarxiv icon

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Add code
Aug 07, 2025
Figure 1 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 2 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 3 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Figure 4 for Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Viaarxiv icon

Omni-Video: Democratizing Unified Video Understanding and Generation

Add code
Jul 09, 2025
Viaarxiv icon

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Add code
May 28, 2025
Figure 1 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 2 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 3 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Figure 4 for SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Viaarxiv icon

Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption

Add code
Mar 12, 2025
Figure 1 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Figure 2 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Figure 3 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Figure 4 for Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Viaarxiv icon

SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models

Add code
Mar 11, 2025
Figure 1 for SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Figure 2 for SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Figure 3 for SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Figure 4 for SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Viaarxiv icon