Picture for Lei Xie

Lei Xie

Nanjing University

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Add code
Apr 24, 2026
Viaarxiv icon

Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge

Add code
Apr 23, 2026
Viaarxiv icon

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech

Add code
Apr 20, 2026
Viaarxiv icon

Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

Add code
Apr 14, 2026
Viaarxiv icon

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Add code
Apr 13, 2026
Viaarxiv icon

EvoTSE: Evolving Enrollment for Target Speaker Extraction

Add code
Apr 09, 2026
Viaarxiv icon

FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Add code
Apr 07, 2026
Viaarxiv icon

Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR

Add code
Apr 03, 2026
Viaarxiv icon

Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model

Add code
Mar 25, 2026
Viaarxiv icon

YingMusic-Singer: Controllable Singing Voice Synthesis with Flexible Lyric Manipulation and Annotation-free Melody Guidance

Add code
Mar 25, 2026
Viaarxiv icon