Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Nov 05, 2020

Eunjung Han, Chul Lee, Andreas Stolcke

Figure 1 for BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Figure 2 for BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Figure 3 for BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Figure 4 for BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Share this with someone who'll enjoy it:

Abstract:We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the EDA architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants of it. For unlimited-latency BW-EDA-EEND, which processes inputs in linear time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but it still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.

View paper on

Share this with someone who'll enjoy it:

Title:BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Paper and Code