Picture for Kunal Dhawan

Kunal Dhawan

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Add code
Sep 10, 2024
Figure 1 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 2 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 3 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 4 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Viaarxiv icon

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR

Add code
Sep 02, 2024
Figure 1 for Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Viaarxiv icon

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Add code
Aug 23, 2024
Figure 1 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 2 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 3 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 4 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Viaarxiv icon

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Add code
Jul 03, 2024
Figure 1 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Figure 2 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Figure 3 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Figure 4 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Viaarxiv icon

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Add code
Jun 28, 2024
Viaarxiv icon

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

Add code
Jun 07, 2024
Figure 1 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Figure 2 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Figure 3 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Figure 4 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Viaarxiv icon

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

Add code
Oct 18, 2023
Viaarxiv icon

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

Add code
Oct 18, 2023
Figure 1 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Figure 2 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Figure 3 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Figure 4 for Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
Viaarxiv icon

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

Add code
Sep 19, 2023
Viaarxiv icon

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Add code
Sep 14, 2023
Figure 1 for Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
Figure 2 for Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
Figure 3 for Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
Figure 4 for Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
Viaarxiv icon