Picture for Di Zhang

Di Zhang

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Add code
Aug 11, 2025
Viaarxiv icon

Score Augmentation for Diffusion Models

Add code
Aug 11, 2025
Viaarxiv icon

AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation

Add code
Aug 01, 2025
Viaarxiv icon

Imbalance in Balance: Online Concept Balancing in Generation Models

Add code
Jul 17, 2025
Viaarxiv icon

Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW

Add code
Jul 01, 2025
Viaarxiv icon

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Add code
Jun 26, 2025
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Viaarxiv icon

FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

Add code
Jun 23, 2025
Viaarxiv icon

SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition

Add code
Jun 09, 2025
Viaarxiv icon

FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers

Add code
Jun 05, 2025
Viaarxiv icon