Audio Generation


A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration

Add code
May 08, 2025
Viaarxiv icon

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Add code
May 08, 2025
Viaarxiv icon

Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization

Add code
May 08, 2025
Viaarxiv icon

FLAM: Frame-Wise Language-Audio Modeling

Add code
May 08, 2025
Viaarxiv icon

Discrete Optimal Transport and Voice Conversion

Add code
May 07, 2025
Viaarxiv icon

Normalize Everything: A Preconditioned Magnitude-Preserving Architecture for Diffusion-Based Speech Enhancement

Add code
May 08, 2025
Viaarxiv icon

ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

Add code
May 07, 2025
Viaarxiv icon

FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech

Add code
May 08, 2025
Viaarxiv icon

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

Add code
May 07, 2025
Viaarxiv icon

PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model

Add code
May 07, 2025
Viaarxiv icon