speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration

Add code
Mar 05, 2026
Viaarxiv icon

WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech

Add code
Mar 05, 2026
Viaarxiv icon

Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages

Add code
Mar 01, 2026
Viaarxiv icon

BabAR: from phoneme recognition to developmental measures of young children's speech production

Add code
Mar 05, 2026
Viaarxiv icon

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards

Add code
Mar 05, 2026
Viaarxiv icon

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

Add code
Mar 11, 2026
Viaarxiv icon

Acoustic and Semantic Modeling of Emotion in Spoken Language

Add code
Mar 10, 2026
Viaarxiv icon

ACES: Accent Subspaces for Coupling, Explanations, and Stress-Testing in Automatic Speech Recognition

Add code
Feb 28, 2026
Viaarxiv icon

Multimodal Emotion Recognition via Bi-directional Cross-Attention and Temporal Modeling

Add code
Mar 12, 2026
Viaarxiv icon

More Data, Fewer Diacritics: Scaling Arabic TTS

Add code
Mar 02, 2026
Viaarxiv icon