Picture for Xie Chen

Xie Chen

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Add code
Feb 15, 2026
Viaarxiv icon

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Add code
Feb 09, 2026
Viaarxiv icon

Audio ControlNet for Fine-Grained Audio Generation and Editing

Add code
Feb 04, 2026
Viaarxiv icon

VIBEVOICE-ASR Technical Report

Add code
Jan 26, 2026
Viaarxiv icon

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Add code
Jan 20, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Add code
Jan 07, 2026
Viaarxiv icon

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

Add code
Jan 06, 2026
Viaarxiv icon

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

Add code
Dec 21, 2025
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Figure 1 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 2 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 3 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Figure 4 for UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Viaarxiv icon