Picture for Wenxi Chen

Wenxi Chen

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Add code
Aug 08, 2025
Viaarxiv icon

Towards Reliable Large Audio Language Model

Add code
May 25, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon

Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation

Add code
Apr 27, 2025
Figure 1 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Figure 2 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Figure 3 for Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation

Add code
Apr 22, 2025
Viaarxiv icon

Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection

Add code
Mar 24, 2025
Viaarxiv icon

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

Add code
Feb 25, 2025
Figure 1 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 2 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 3 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Figure 4 for URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Viaarxiv icon