Picture for Nithin Rao Koluguri

Nithin Rao Koluguri

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Add code
Sep 17, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

Training and Inference Efficiency of Encoder-Decoder Speech Models

Add code
Mar 07, 2025
Viaarxiv icon

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Add code
Sep 10, 2024
Figure 1 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 2 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 3 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Figure 4 for Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Viaarxiv icon

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation

Add code
Sep 09, 2024
Figure 1 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 2 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 3 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Figure 4 for Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Viaarxiv icon

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Add code
Aug 23, 2024
Figure 1 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 2 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 3 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Figure 4 for NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Viaarxiv icon

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Add code
Jul 03, 2024
Figure 1 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Figure 2 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Figure 3 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Figure 4 for Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Viaarxiv icon

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Add code
Jun 28, 2024
Viaarxiv icon

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

Add code
Jun 28, 2024
Figure 1 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Figure 2 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Figure 3 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Figure 4 for BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Viaarxiv icon

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

Add code
Jun 07, 2024
Figure 1 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Figure 2 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Figure 3 for Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis
Viaarxiv icon