Picture for Shalini Ghosh

Shalini Ghosh

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Add code
Mar 28, 2024
Figure 1 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Figure 2 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Figure 3 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Figure 4 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Viaarxiv icon

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Add code
Jan 17, 2024
Viaarxiv icon

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

Add code
Jan 05, 2024
Figure 1 for Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
Figure 2 for Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
Figure 3 for Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
Figure 4 for Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
Viaarxiv icon

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

Add code
Jan 04, 2024
Figure 1 for Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
Figure 2 for Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
Figure 3 for Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
Figure 4 for Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
Viaarxiv icon

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Add code
Dec 22, 2023
Figure 1 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 2 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 3 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Figure 4 for Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification
Viaarxiv icon

JAB: Joint Adversarial Prompting and Belief Augmentation

Add code
Nov 16, 2023
Viaarxiv icon

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Add code
Oct 10, 2023
Figure 1 for Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Figure 2 for Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Figure 3 for Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Figure 4 for Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Viaarxiv icon

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Add code
Sep 26, 2023
Figure 1 for Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Figure 2 for Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Figure 3 for Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Figure 4 for Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Viaarxiv icon

FLIRT: Feedback Loop In-context Red Teaming

Add code
Aug 08, 2023
Figure 1 for FLIRT: Feedback Loop In-context Red Teaming
Figure 2 for FLIRT: Feedback Loop In-context Red Teaming
Figure 3 for FLIRT: Feedback Loop In-context Red Teaming
Figure 4 for FLIRT: Feedback Loop In-context Red Teaming
Viaarxiv icon

Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

Add code
Apr 04, 2023
Figure 1 for Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Figure 2 for Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Figure 3 for Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Figure 4 for Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Viaarxiv icon