speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Add code
Mar 10, 2026
Viaarxiv icon

Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study

Add code
Mar 02, 2026
Viaarxiv icon

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Add code
Mar 09, 2026
Viaarxiv icon

Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition

Add code
Mar 05, 2026
Viaarxiv icon

Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS

Add code
Mar 09, 2026
Viaarxiv icon

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Add code
Mar 04, 2026
Viaarxiv icon

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Add code
Mar 05, 2026
Viaarxiv icon

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

Add code
Mar 10, 2026
Viaarxiv icon

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

Add code
Mar 08, 2026
Viaarxiv icon

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

Add code
Mar 09, 2026
Viaarxiv icon