Picture for Jagadeesh Balam

Jagadeesh Balam

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Add code
Sep 17, 2025
Viaarxiv icon

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription

Add code
Aug 07, 2025
Figure 1 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Figure 2 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Figure 3 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Figure 4 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Viaarxiv icon

Word Level Timestamp Generation for Automatic Speech Recognition and Translation

Add code
May 21, 2025
Viaarxiv icon

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

Add code
May 21, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

Training and Inference Efficiency of Encoder-Decoder Speech Models

Add code
Mar 07, 2025
Viaarxiv icon

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Add code
Nov 08, 2024
Viaarxiv icon

Anticipating Future with Large Language Model for Simultaneous Machine Translation

Add code
Oct 29, 2024
Figure 1 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 2 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 3 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 4 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Viaarxiv icon

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Figure 1 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 2 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 3 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Figure 4 for VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Viaarxiv icon

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Add code
Sep 30, 2024
Figure 1 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 2 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 3 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 4 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Viaarxiv icon