Picture for Lei Xie

Lei Xie

Nanjing University

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Add code
Jun 09, 2026
Viaarxiv icon

MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion

Add code
Jun 08, 2026
Viaarxiv icon

G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching

Add code
Jun 07, 2026
Viaarxiv icon

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

Add code
Jun 05, 2026
Viaarxiv icon

Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models

Add code
Jun 05, 2026
Viaarxiv icon

InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models

Add code
Jun 01, 2026
Viaarxiv icon

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription

Add code
Jun 01, 2026
Viaarxiv icon

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Add code
May 12, 2026
Viaarxiv icon

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Add code
Apr 24, 2026
Viaarxiv icon

Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge

Add code
Apr 23, 2026
Viaarxiv icon