speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Duration Aware Scheduling for ASR Serving Under Workload Drift

Add code
Mar 11, 2026
Viaarxiv icon

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Add code
Mar 09, 2026
Viaarxiv icon

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Add code
Mar 10, 2026
Viaarxiv icon

Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS

Add code
Mar 09, 2026
Viaarxiv icon

Nwāchā Munā: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

Add code
Mar 08, 2026
Viaarxiv icon

Reading the Mood Behind Words: Integrating Prosody-Derived Emotional Context into Socially Responsive VR Agents

Add code
Mar 10, 2026
Viaarxiv icon

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

Add code
Mar 09, 2026
Viaarxiv icon

PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration

Add code
Mar 05, 2026
Viaarxiv icon

WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech

Add code
Mar 05, 2026
Viaarxiv icon

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards

Add code
Mar 05, 2026
Viaarxiv icon