speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Who Spoke What When? Evaluating Spoken Language Models for Conversational ASR with Semantic and Overlap-Aware Metrics

Add code
Mar 24, 2026
Viaarxiv icon

MSR-HuBERT: Self-supervised Pre-training for Adaptation to Multiple Sampling Rates

Add code
Mar 24, 2026
Viaarxiv icon

When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse

Add code
Mar 24, 2026
Viaarxiv icon

From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics

Add code
Mar 24, 2026
Viaarxiv icon

Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks

Add code
Mar 23, 2026
Viaarxiv icon

Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework

Add code
Mar 24, 2026
Viaarxiv icon

Ara-Best-RQ: Multi Dialectal Arabic SSL

Add code
Mar 23, 2026
Viaarxiv icon

SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding

Add code
Mar 23, 2026
Viaarxiv icon

MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

Add code
Mar 23, 2026
Viaarxiv icon

Demonstration of Adapt4Me: An Uncertainty-Aware Authoring Environment for Personalizing Automatic Speech Recognition to Non-normative Speech

Add code
Mar 20, 2026
Viaarxiv icon