speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Empowering Video Translation using Multimodal Large Language Models

Add code
Apr 13, 2026
Viaarxiv icon

Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music

Add code
Apr 12, 2026
Viaarxiv icon

Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts

Add code
Apr 10, 2026
Viaarxiv icon

Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages

Add code
Apr 10, 2026
Viaarxiv icon

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition

Add code
Apr 08, 2026
Viaarxiv icon

Demographic and Linguistic Bias Evaluation in Omnimodal Language Models

Add code
Apr 11, 2026
Viaarxiv icon

When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP

Add code
Apr 14, 2026
Viaarxiv icon

Rethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMs

Add code
Apr 09, 2026
Viaarxiv icon

Script Collapse in Multilingual ASR: Defining and Measuring Script Fidelity Rate

Add code
Apr 09, 2026
Viaarxiv icon

AfriVoices-KE: A Multilingual Speech Dataset for Kenyan Languages

Add code
Apr 09, 2026
Viaarxiv icon