speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper

Add code
Mar 05, 2026
Viaarxiv icon

PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration

Add code
Mar 05, 2026
Viaarxiv icon

WhisperAlign: Word-Boundary-Aware ASR and WhisperX-Anchored Pyannote Diarization for Long-Form Bengali Speech

Add code
Mar 05, 2026
Viaarxiv icon

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

Add code
Feb 25, 2026
Viaarxiv icon

BabAR: from phoneme recognition to developmental measures of young children's speech production

Add code
Mar 05, 2026
Viaarxiv icon

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units

Add code
Mar 19, 2026
Viaarxiv icon

Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization

Add code
Feb 25, 2026
Viaarxiv icon

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards

Add code
Mar 05, 2026
Viaarxiv icon

More Data, Fewer Diacritics: Scaling Arabic TTS

Add code
Mar 02, 2026
Viaarxiv icon

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement

Add code
Mar 02, 2026
Viaarxiv icon