Picture for Di Zhang

Di Zhang

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Add code
Aug 25, 2025
Viaarxiv icon

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Add code
Aug 20, 2025
Viaarxiv icon

Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS

Add code
Aug 19, 2025
Viaarxiv icon

Score Augmentation for Diffusion Models

Add code
Aug 11, 2025
Viaarxiv icon

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Add code
Aug 11, 2025
Viaarxiv icon

AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation

Add code
Aug 01, 2025
Viaarxiv icon

Imbalance in Balance: Online Concept Balancing in Generation Models

Add code
Jul 17, 2025
Viaarxiv icon

Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW

Add code
Jul 01, 2025
Viaarxiv icon

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

Add code
Jun 26, 2025
Viaarxiv icon

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Add code
Jun 24, 2025
Viaarxiv icon