Picture for Wenxi Chen

Wenxi Chen

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Add code
May 07, 2026
Viaarxiv icon

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Add code
Apr 14, 2026
Viaarxiv icon

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Add code
Mar 16, 2026
Viaarxiv icon

Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World

Add code
Feb 03, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Figure 1 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 2 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 3 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 4 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Viaarxiv icon

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Add code
Aug 08, 2025
Viaarxiv icon

Towards Reliable Large Audio Language Model

Add code
May 25, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon