Picture for Jagadeesh Balam

Jagadeesh Balam

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

Add code
May 21, 2025
Viaarxiv icon

Word Level Timestamp Generation for Automatic Speech Recognition and Translation

Add code
May 21, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

Training and Inference Efficiency of Encoder-Decoder Speech Models

Add code
Mar 07, 2025
Viaarxiv icon

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Add code
Nov 08, 2024
Viaarxiv icon

Anticipating Future with Large Language Model for Simultaneous Machine Translation

Add code
Oct 29, 2024
Figure 1 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 2 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 3 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Figure 4 for Anticipating Future with Large Language Model for Simultaneous Machine Translation
Viaarxiv icon

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning

Add code
Oct 23, 2024
Viaarxiv icon

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Add code
Sep 30, 2024
Figure 1 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 2 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 3 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Figure 4 for Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Viaarxiv icon

Chain-of-Thought Prompting for Speech Translation

Add code
Sep 17, 2024
Figure 1 for Chain-of-Thought Prompting for Speech Translation
Figure 2 for Chain-of-Thought Prompting for Speech Translation
Figure 3 for Chain-of-Thought Prompting for Speech Translation
Figure 4 for Chain-of-Thought Prompting for Speech Translation
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon