Picture for Changhao Pan

Changhao Pan

A Survey of Full-Duplex Spoken Dialogue Systems: Architectural Hierarchy, Interaction Ontology, and Decision State Machine

Add code
Jun 17, 2026
Viaarxiv icon

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

Add code
Jun 09, 2026
Viaarxiv icon

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Add code
May 29, 2026
Viaarxiv icon

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue

Add code
May 29, 2026
Viaarxiv icon

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Add code
May 27, 2026
Viaarxiv icon

TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation

Add code
May 03, 2026
Viaarxiv icon

Diffusion Model as a Generalist Segmentation Learner

Add code
Apr 27, 2026
Viaarxiv icon

ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

Add code
Apr 09, 2026
Viaarxiv icon

Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Add code
Mar 16, 2026
Viaarxiv icon

Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

Add code
Jan 20, 2026
Viaarxiv icon