Picture for Ashish Seth

Ashish Seth

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Add code
Jun 17, 2024
Viaarxiv icon

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Add code
Jun 06, 2024
Viaarxiv icon

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

Add code
Dec 20, 2023
Viaarxiv icon

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

Add code
Dec 20, 2023
Figure 1 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Figure 2 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Figure 3 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Figure 4 for Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Viaarxiv icon

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Add code
Oct 12, 2023
Figure 1 for CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Figure 2 for CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Figure 3 for CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Figure 4 for CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Viaarxiv icon

DeAR: Debiasing Vision-Language Models with Additive Residuals

Add code
Mar 18, 2023
Figure 1 for DeAR: Debiasing Vision-Language Models with Additive Residuals
Figure 2 for DeAR: Debiasing Vision-Language Models with Additive Residuals
Figure 3 for DeAR: Debiasing Vision-Language Models with Additive Residuals
Figure 4 for DeAR: Debiasing Vision-Language Models with Additive Residuals
Viaarxiv icon

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

Add code
Mar 10, 2023
Figure 1 for UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
Figure 2 for UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
Viaarxiv icon

MAST: Multiscale Audio Spectrogram Transformers

Add code
Nov 02, 2022
Figure 1 for MAST: Multiscale Audio Spectrogram Transformers
Figure 2 for MAST: Multiscale Audio Spectrogram Transformers
Figure 3 for MAST: Multiscale Audio Spectrogram Transformers
Viaarxiv icon

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

Add code
Nov 02, 2022
Figure 1 for SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Figure 2 for SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Figure 3 for SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Figure 4 for SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Viaarxiv icon

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Add code
Nov 01, 2022
Figure 1 for Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages
Figure 2 for Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages
Viaarxiv icon