speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Recovering Performance in Speech Emotion Recognition from Discrete Tokens via Multi-Layer Fusion and Paralinguistic Feature Integration

Add code
Jan 23, 2026
Viaarxiv icon

Sink or SWIM: Tackling Real-Time ASR at Scale

Add code
Jan 22, 2026
Viaarxiv icon

Test-Time Adaptation for Speech Emotion Recognition

Add code
Jan 21, 2026
Viaarxiv icon

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models

Add code
Jan 21, 2026
Viaarxiv icon

Inverse-Hessian Regularization for Continual Learning in ASR

Add code
Jan 21, 2026
Viaarxiv icon

Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface

Add code
Jan 21, 2026
Viaarxiv icon

TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice

Add code
Jan 22, 2026
Viaarxiv icon

SoundBreak: A Systematic Study of Audio-Only Adversarial Attacks on Trimodal Models

Add code
Jan 20, 2026
Viaarxiv icon

Lost in Transcription: How Speech-to-Text Errors Derail Code Understanding

Add code
Jan 20, 2026
Viaarxiv icon

PRiSM: Benchmarking Phone Realization in Speech Models

Add code
Jan 20, 2026
Viaarxiv icon