speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Training-Free Intelligibility-Guided Observation Addition for Noisy ASR

Add code
Feb 24, 2026
Viaarxiv icon

Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text

Add code
Feb 27, 2026
Viaarxiv icon

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Add code
Feb 28, 2026
Viaarxiv icon

A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment

Add code
Feb 26, 2026
Viaarxiv icon

Acoustic and Semantic Modeling of Emotion in Spoken Language

Add code
Mar 10, 2026
Viaarxiv icon

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

Add code
Mar 11, 2026
Viaarxiv icon

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

Add code
Feb 26, 2026
Viaarxiv icon

Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment

Add code
Feb 26, 2026
Viaarxiv icon

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Add code
Mar 12, 2026
Viaarxiv icon

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies

Add code
Mar 11, 2026
Viaarxiv icon