speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

ProSarc: Prosody-Aware Sarcasm Recognition Framework via Temporal Prosodic Incongruity

Add code
Jun 04, 2026
Viaarxiv icon

SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpus for In-the-Wild Speaker Verification

Add code
Jun 03, 2026
Viaarxiv icon

Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

Add code
May 28, 2026
Viaarxiv icon

Your Multimodal Speech Model Says I Have a Face for Radio

Add code
May 28, 2026
Viaarxiv icon

Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus

Add code
May 29, 2026
Viaarxiv icon

TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens

Add code
May 29, 2026
Viaarxiv icon

Syllabic-Structure Decoder for Automatic Speech Recognition in Vietnamese

Add code
May 27, 2026
Viaarxiv icon

Data-Efficient On-Policy Distillation for Automatic Speech Recognition

Add code
May 27, 2026
Viaarxiv icon

Diffusion Large Language Models for Visual Speech Recognition

Add code
May 27, 2026
Viaarxiv icon

Deep Binarized Photonic Reservoir Computing for Ultrafast Multimedia Signal Processing

Add code
May 28, 2026
Viaarxiv icon