speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

JAL-Turn: Joint Acoustic-Linguistic Modeling for Real-Time and Robust Turn-Taking Detection in Full-Duplex Spoken Dialogue Systems

Add code
Mar 27, 2026
Viaarxiv icon

Back to Basics: Revisiting ASR in the Age of Voice Agents

Add code
Mar 26, 2026
Viaarxiv icon

Prompt Amplification and Zero-Shot Late Fusion in Audio-Language Models for Speech Emotion Recognition

Add code
Mar 24, 2026
Viaarxiv icon

Cascade-Free Mandarin Visual Speech Recognition via Semantic-Guided Cross-Representation Alignment

Add code
Mar 23, 2026
Viaarxiv icon

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

Add code
Mar 25, 2026
Viaarxiv icon

Crab: Multi Layer Contrastive Supervision to Improve Speech Emotion Recognition Under Both Acted and Natural Speech Condition

Add code
Mar 24, 2026
Viaarxiv icon

An Empirical Recipe for Universal Phone Recognition

Add code
Mar 30, 2026
Viaarxiv icon

Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics

Add code
Mar 24, 2026
Viaarxiv icon

MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates

Add code
Mar 24, 2026
Viaarxiv icon

Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation

Add code
Mar 27, 2026
Viaarxiv icon