Picture for Rita Singh

Rita Singh

VerLM: Explaining Face Verification Using Natural Language

Add code
Jan 05, 2026
Viaarxiv icon

OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics

Add code
Sep 04, 2025
Figure 1 for OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics
Figure 2 for OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics
Figure 3 for OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics
Figure 4 for OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics
Viaarxiv icon

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

Add code
Jun 25, 2025
Viaarxiv icon

CoLMbo: Speaker Language Model for Descriptive Profiling

Add code
Jun 11, 2025
Viaarxiv icon

CAARMA: Class Augmentation with Adversarial Mixup Regularization

Add code
Mar 20, 2025
Figure 1 for CAARMA: Class Augmentation with Adversarial Mixup Regularization
Figure 2 for CAARMA: Class Augmentation with Adversarial Mixup Regularization
Figure 3 for CAARMA: Class Augmentation with Adversarial Mixup Regularization
Figure 4 for CAARMA: Class Augmentation with Adversarial Mixup Regularization
Viaarxiv icon

A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound

Add code
Mar 13, 2025
Figure 1 for A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound
Figure 2 for A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound
Figure 3 for A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound
Figure 4 for A New Benchmark for Few-Shot Class-Incremental Learning: Redefining the Upper Bound
Viaarxiv icon

Mellow: a small audio language model for reasoning

Add code
Mar 11, 2025
Figure 1 for Mellow: a small audio language model for reasoning
Figure 2 for Mellow: a small audio language model for reasoning
Figure 3 for Mellow: a small audio language model for reasoning
Figure 4 for Mellow: a small audio language model for reasoning
Viaarxiv icon

Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models

Add code
Feb 18, 2025
Figure 1 for Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
Figure 2 for Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
Figure 3 for Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
Figure 4 for Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models
Viaarxiv icon

On the Robust Approximation of ASR Metrics

Add code
Feb 18, 2025
Figure 1 for On the Robust Approximation of ASR Metrics
Figure 2 for On the Robust Approximation of ASR Metrics
Figure 3 for On the Robust Approximation of ASR Metrics
Figure 4 for On the Robust Approximation of ASR Metrics
Viaarxiv icon

ADIFF: Explaining audio difference using natural language

Add code
Feb 06, 2025
Figure 1 for ADIFF: Explaining audio difference using natural language
Figure 2 for ADIFF: Explaining audio difference using natural language
Figure 3 for ADIFF: Explaining audio difference using natural language
Figure 4 for ADIFF: Explaining audio difference using natural language
Viaarxiv icon