speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

The USTC-NERCSLIP Systems for the CHiME-9 MCoRec Challenge

Add code
Mar 02, 2026
Viaarxiv icon

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

Add code
Feb 25, 2026
Viaarxiv icon

End-to-End Simultaneous Dysarthric Speech Reconstruction with Frame-Level Adaptor and Multiple Wait-k Knowledge Distillation

Add code
Mar 02, 2026
Viaarxiv icon

SpectroFusion-ViT: A Lightweight Transformer for Speech Emotion Recognition Using Harmonic Mel-Chroma Fusion

Add code
Feb 28, 2026
Viaarxiv icon

The Patrologia Graeca Corpus: OCR, Annotation, and Open Release of Noisy Nineteenth-Century Polytonic Greek Editions

Add code
Mar 10, 2026
Viaarxiv icon

Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization

Add code
Feb 25, 2026
Viaarxiv icon

Polynomial Mixing for Efficient Self-supervised Speech Encoders

Add code
Feb 28, 2026
Viaarxiv icon

Dialect and Gender Bias in YouTube's Spanish Captioning System

Add code
Feb 27, 2026
Viaarxiv icon

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

Add code
Feb 28, 2026
Viaarxiv icon

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Add code
Feb 27, 2026
Viaarxiv icon