speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech

Add code
Mar 05, 2026
Viaarxiv icon

BabAR: from phoneme recognition to developmental measures of young children's speech production

Add code
Mar 05, 2026
Viaarxiv icon

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards

Add code
Mar 05, 2026
Viaarxiv icon

Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study

Add code
Mar 02, 2026
Viaarxiv icon

Using Songs to Improve Kazakh Automatic Speech Recognition

Add code
Mar 03, 2026
Viaarxiv icon

An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization

Add code
Mar 03, 2026
Viaarxiv icon

More Data, Fewer Diacritics: Scaling Arabic TTS

Add code
Mar 02, 2026
Viaarxiv icon

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement

Add code
Mar 02, 2026
Viaarxiv icon

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

Add code
Mar 01, 2026
Viaarxiv icon

RO-N3WS: Enhancing Generalization in Low-Resource ASR with Diverse Romanian Speech Benchmarks

Add code
Mar 02, 2026
Viaarxiv icon