Picture for Lei Xie

Lei Xie

Nanjing University

MSU-Bench: Towards Speaker-Centric Understanding in Conversational Multi-Speaker Scenarios

Add code
Jun 22, 2026
Viaarxiv icon

MambaADv2: Evolving Duality-enhanced State Space Model for Unsupervised Anomaly Detection

Add code
Jun 22, 2026
Viaarxiv icon

FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Add code
Jun 09, 2026
Viaarxiv icon

MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion

Add code
Jun 08, 2026
Viaarxiv icon

G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching

Add code
Jun 07, 2026
Viaarxiv icon

Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models

Add code
Jun 05, 2026
Viaarxiv icon

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

Add code
Jun 05, 2026
Viaarxiv icon

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription

Add code
Jun 01, 2026
Viaarxiv icon

InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models

Add code
Jun 01, 2026
Viaarxiv icon

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model

Add code
May 12, 2026
Viaarxiv icon