Picture for Yashesh Gaur

Yashesh Gaur

Jack

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens

Add code
Oct 04, 2024
Viaarxiv icon

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

Add code
Oct 02, 2024
Figure 1 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 2 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 3 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Figure 4 for Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

Add code
Jun 13, 2024
Viaarxiv icon

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Add code
Nov 03, 2023
Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Add code
Oct 23, 2023
Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Viaarxiv icon

On decoder-only architecture for speech-to-text and large language model integration

Add code
Jul 14, 2023
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Add code
Jul 07, 2023
Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Add code
May 25, 2023
Viaarxiv icon

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Add code
Nov 07, 2022
Figure 1 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 2 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 3 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Figure 4 for Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition
Viaarxiv icon