speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition

Add code
Jan 28, 2026
Viaarxiv icon

Do we really need Self-Attention for Streaming Automatic Speech Recognition?

Add code
Jan 27, 2026
Viaarxiv icon

Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization

Add code
Feb 09, 2026
Viaarxiv icon

Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models

Add code
Feb 01, 2026
Viaarxiv icon

Semantics-Aware Generative Latent Data Augmentation for Learning in Low-Resource Domains

Add code
Feb 02, 2026
Viaarxiv icon

Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition

Add code
Jan 27, 2026
Viaarxiv icon

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

Add code
Jan 30, 2026
Viaarxiv icon

OCR-Enhanced Multimodal ASR Can Read While Listening

Add code
Jan 26, 2026
Viaarxiv icon

Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

Add code
Jan 26, 2026
Viaarxiv icon

BanglaRobustNet: A Hybrid Denoising-Attention Architecture for Robust Bangla Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon