speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

Add code
Mar 01, 2026
Viaarxiv icon

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

Add code
Feb 28, 2026
Viaarxiv icon

SpectroFusion-ViT: A Lightweight Transformer for Speech Emotion Recognition Using Harmonic Mel-Chroma Fusion

Add code
Feb 28, 2026
Viaarxiv icon

Polynomial Mixing for Efficient Self-supervised Speech Encoders

Add code
Feb 28, 2026
Viaarxiv icon

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Add code
Feb 28, 2026
Viaarxiv icon

Dialect and Gender Bias in YouTube's Spanish Captioning System

Add code
Feb 27, 2026
Viaarxiv icon

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Add code
Feb 27, 2026
Viaarxiv icon

Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

Add code
Feb 27, 2026
Viaarxiv icon

DashengTokenizer: One layer is enough for unified audio understanding and generation

Add code
Feb 27, 2026
Viaarxiv icon

A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment

Add code
Feb 26, 2026
Viaarxiv icon