Speaker Diarization


Speaker diarization is the process of segmenting and clustering speech signals to identify different speakers in an audio recording.

Guided Speaker Embedding

Add code
Oct 16, 2024
Figure 1 for Guided Speaker Embedding
Figure 2 for Guided Speaker Embedding
Figure 3 for Guided Speaker Embedding
Figure 4 for Guided Speaker Embedding
Viaarxiv icon

A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR

Add code
Sep 09, 2024
Figure 1 for A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
Figure 2 for A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
Viaarxiv icon

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

Add code
Oct 28, 2024
Figure 1 for Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models
Figure 2 for Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models
Figure 3 for Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models
Viaarxiv icon

STCON System for the CHiME-8 Challenge

Add code
Oct 17, 2024
Figure 1 for STCON System for the CHiME-8 Challenge
Figure 2 for STCON System for the CHiME-8 Challenge
Figure 3 for STCON System for the CHiME-8 Challenge
Figure 4 for STCON System for the CHiME-8 Challenge
Viaarxiv icon

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Add code
Sep 10, 2024
Figure 1 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 2 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 3 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 4 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Viaarxiv icon

Leveraging Self-Supervised Learning for Speaker Diarization

Add code
Sep 14, 2024
Figure 1 for Leveraging Self-Supervised Learning for Speaker Diarization
Figure 2 for Leveraging Self-Supervised Learning for Speaker Diarization
Figure 3 for Leveraging Self-Supervised Learning for Speaker Diarization
Figure 4 for Leveraging Self-Supervised Learning for Speaker Diarization
Viaarxiv icon

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Add code
Aug 22, 2024
Figure 1 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Figure 2 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Figure 3 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Figure 4 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Viaarxiv icon

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Add code
Sep 01, 2024
Figure 1 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 2 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 3 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Figure 4 for LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Viaarxiv icon

An approach to optimize inference of the DIART speaker diarization pipeline

Add code
Aug 05, 2024
Viaarxiv icon

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Add code
Sep 07, 2024
Figure 1 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 2 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 3 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 4 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Viaarxiv icon