Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haoyuan Yu

Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems

Jan 29, 2026

Haoyuan Yu, Yuxuan Chen, Minjie Cai

Abstract:Full-duplex voice interaction is crucial for natural human computer interaction. We present a framework that decomposes complex dialogue into minimal conversational units, enabling the system to process each unit independently and predict when to transit to the next. This framework is instantiated as a semi-cascaded full-duplex dialogue system built around a multimodal large language model, supported by auxiliary modules such as voice activity detection (VAD) and text-to-speech (TTS) synthesis. The resulting system operates in a train-free, plug-and-play manner. Experiments on the HumDial dataset demonstrate the effectiveness of our framework, which ranks second among all teams on the test set of the Human-like Spoken Dialogue Systems Challenge (Track 2: Full-Duplex Interaction). Code is available at the GitHub repository https://github.com/yu-haoyuan/fd-badcat.

* ICASSP 2026 (Grant Challenge). https://github.com/yu-haoyuan/fd-badcat

Via

Access Paper or Ask Questions

From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Sep 18, 2025

Yuxuan Chen, Haoyuan Yu

Figure 1 for From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Figure 2 for From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Figure 3 for From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Figure 4 for From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Abstract:True Full-Duplex (TFD) voice communication--enabling simultaneous listening and speaking with natural turn-taking, overlapping speech, and interruptions--represents a critical milestone toward human-like AI interaction. This survey comprehensively reviews Full-Duplex Spoken Language Models (FD-SLMs) in the LLM era. We establish a taxonomy distinguishing Engineered Synchronization (modular architectures) from Learned Synchronization (end-to-end architectures), and unify fragmented evaluation approaches into a framework encompassing Temporal Dynamics, Behavioral Arbitration, Semantic Coherence, and Acoustic Performance. Through comparative analysis of mainstream FD-SLMs, we identify fundamental challenges: synchronous data scarcity, architectural divergence, and evaluation gaps, providing a roadmap for advancing human-AI communication.

Via

Access Paper or Ask Questions