Picture for Yashesh Gaur

Yashesh Gaur

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

Add code
Jun 13, 2024
Viaarxiv icon

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Add code
Nov 03, 2023
Figure 1 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Figure 2 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Figure 3 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Figure 4 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Add code
Oct 23, 2023
Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Viaarxiv icon

On decoder-only architecture for speech-to-text and large language model integration

Add code
Jul 14, 2023
Figure 1 for On decoder-only architecture for speech-to-text and large language model integration
Figure 2 for On decoder-only architecture for speech-to-text and large language model integration
Figure 3 for On decoder-only architecture for speech-to-text and large language model integration
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Add code
Jul 07, 2023
Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Add code
May 25, 2023
Figure 1 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 2 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 3 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Figure 4 for VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Viaarxiv icon

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Add code
Nov 07, 2022
Figure 1 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 2 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 3 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 4 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Viaarxiv icon

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Add code
Nov 05, 2022
Figure 1 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 2 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 3 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Figure 4 for LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Viaarxiv icon

CTCBERT: Advancing Hidden-unit BERT with CTC Objectives

Add code
Oct 16, 2022
Figure 1 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 2 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 3 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Figure 4 for CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Viaarxiv icon

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

Add code
Oct 16, 2022
Figure 1 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 2 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 3 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Figure 4 for Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Viaarxiv icon