Audio Generation


Bagpiper: Solving Open-Ended Audio Tasks via Rich Captions

Add code
Feb 05, 2026
Viaarxiv icon

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track

Add code
Feb 05, 2026
Viaarxiv icon

ERNIE 5.0 Technical Report

Add code
Feb 04, 2026
Viaarxiv icon

A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model

Add code
Feb 04, 2026
Viaarxiv icon

LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues

Add code
Feb 04, 2026
Viaarxiv icon

OmniRAG-Agent: Agentic Omnimodal Reasoning for Low-Resource Long Audio-Video Question Answering

Add code
Feb 03, 2026
Viaarxiv icon

Conditional Flow Matching for Visually-Guided Acoustic Highlighting

Add code
Feb 03, 2026
Viaarxiv icon

Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation

Add code
Feb 03, 2026
Viaarxiv icon

MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive Orchestration

Add code
Feb 03, 2026
Viaarxiv icon