speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper

Add code
Jan 27, 2026
Viaarxiv icon

Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity

Add code
Jan 26, 2026
Viaarxiv icon

Recovering Performance in Speech Emotion Recognition from Discrete Tokens via Multi-Layer Fusion and Paralinguistic Feature Integration

Add code
Jan 23, 2026
Viaarxiv icon

Test-Time Adaptation for Speech Emotion Recognition

Add code
Jan 21, 2026
Viaarxiv icon

Efficient Rehearsal for Continual Learning in ASR via Singular Value Tuning

Add code
Jan 26, 2026
Viaarxiv icon

Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

Add code
Jan 25, 2026
Viaarxiv icon

Factored Reasoning with Inner Speech and Persistent Memory for Evidence-Grounded Human-Robot Interaction

Add code
Jan 31, 2026
Viaarxiv icon

MA-LipNet: Multi-Dimensional Attention Networks for Robust Lipreading

Add code
Jan 27, 2026
Viaarxiv icon

AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text

Add code
Jan 25, 2026
Viaarxiv icon

Noise-Robust AV-ASR Using Visual Features Both in the Whisper Encoder and Decoder

Add code
Jan 26, 2026
Viaarxiv icon