Picture for Sreyan Ghosh

Sreyan Ghosh

SPUR: A Plug-and-Play Framework for Integrating Spatial Audio Understanding and Reasoning into Large Audio-Language Models

Add code
Nov 13, 2025
Viaarxiv icon

Music Flamingo: Scaling Music Understanding in Audio Language Models

Add code
Nov 13, 2025
Viaarxiv icon

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

Add code
May 12, 2025
Viaarxiv icon

ProSE: Diffusion Priors for Speech Enhancement

Add code
Mar 09, 2025
Figure 1 for ProSE: Diffusion Priors for Speech Enhancement
Figure 2 for ProSE: Diffusion Priors for Speech Enhancement
Figure 3 for ProSE: Diffusion Priors for Speech Enhancement
Figure 4 for ProSE: Diffusion Priors for Speech Enhancement
Viaarxiv icon

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Add code
Mar 06, 2025
Viaarxiv icon

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Add code
Oct 24, 2024
Figure 1 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 2 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 3 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Figure 4 for MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Viaarxiv icon

Do Audio-Language Models Understand Linguistic Variations?

Add code
Oct 21, 2024
Figure 1 for Do Audio-Language Models Understand Linguistic Variations?
Figure 2 for Do Audio-Language Models Understand Linguistic Variations?
Figure 3 for Do Audio-Language Models Understand Linguistic Variations?
Figure 4 for Do Audio-Language Models Understand Linguistic Variations?
Viaarxiv icon

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification

Add code
Oct 19, 2024
Figure 1 for PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Figure 2 for PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Figure 3 for PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Figure 4 for PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Viaarxiv icon

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning

Add code
Oct 17, 2024
Figure 1 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 2 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 3 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Figure 4 for EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Viaarxiv icon

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

Add code
Oct 17, 2024
Figure 1 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 2 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 3 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 4 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Viaarxiv icon