speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

LaSR: Context-Aware Speech Recognition via Latent Reasoning

Add code
May 30, 2026
Viaarxiv icon

Head-Pose-Aware Visual Speech Recognition with FiLM Modulation

Add code
May 30, 2026
Viaarxiv icon

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription

Add code
Jun 01, 2026
Viaarxiv icon

SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

Add code
Jun 01, 2026
Viaarxiv icon

Speaker-Invariant Representation Learning for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck

Add code
Jun 07, 2026
Viaarxiv icon

MURMUR: An Efficient Inference System for Long-Form ASR

Add code
May 31, 2026
Viaarxiv icon

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Add code
May 28, 2026
Viaarxiv icon

Your Multimodal Speech Model Says I Have a Face for Radio

Add code
May 28, 2026
Viaarxiv icon

Syllabic-Structure Decoder for Automatic Speech Recognition in Vietnamese

Add code
May 27, 2026
Viaarxiv icon

Data-Efficient On-Policy Distillation for Automatic Speech Recognition

Add code
May 27, 2026
Viaarxiv icon