speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Recovering Performance in Speech Emotion Recognition from Discrete Tokens via Multi-Layer Fusion and Paralinguistic Feature Integration

Add code
Jan 23, 2026
Viaarxiv icon

Test-Time Adaptation for Speech Emotion Recognition

Add code
Jan 21, 2026
Viaarxiv icon

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions

Add code
Jan 25, 2026
Viaarxiv icon

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition

Add code
Jan 26, 2026
Viaarxiv icon

Benchmarking von ASR-Modellen im deutschen medizinischen Kontext: Eine Leistungsanalyse anhand von Anamnesegesprächen

Add code
Jan 23, 2026
Viaarxiv icon

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models

Add code
Jan 21, 2026
Viaarxiv icon

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

Add code
Jan 19, 2026
Viaarxiv icon

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

A Baseline Multimodal Approach to Emotion Recognition in Conversations

Add code
Jan 31, 2026
Viaarxiv icon

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon