speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

BanglaRobustNet: A Hybrid Denoising-Attention Architecture for Robust Bangla Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

MA-LipNet: Multi-Dimensional Attention Networks for Robust Lipreading

Add code
Jan 27, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries

Add code
Jan 26, 2026
Viaarxiv icon

Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity

Add code
Jan 26, 2026
Viaarxiv icon

Efficient Rehearsal for Continual Learning in ASR via Singular Value Tuning

Add code
Jan 26, 2026
Viaarxiv icon

SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays

Add code
Jan 25, 2026
Viaarxiv icon

Noise-Robust AV-ASR Using Visual Features Both in the Whisper Encoder and Decoder

Add code
Jan 26, 2026
Viaarxiv icon

Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

Add code
Jan 25, 2026
Viaarxiv icon

AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text

Add code
Jan 25, 2026
Viaarxiv icon