speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Learning Emotion-discriminative Representations for Zero-Shot Cross-lingual Speech Emotion Recognition

Add code
Jun 04, 2026
Viaarxiv icon

Assessing True Generalisability of Audio-Visual Speech Recognisers

Add code
Jun 05, 2026
Viaarxiv icon

The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?

Add code
Jun 05, 2026
Viaarxiv icon

Real-time body pose non-verbal communication with a consistency-based reliability measure

Add code
Jun 08, 2026
Viaarxiv icon

NüshuVoice: Reviving the Voice of Endangered Nüshu with Pitch-Aware Text-to-Speech

Add code
Jun 08, 2026
Viaarxiv icon

Geometric Second-Order Feature Correlation Learning for Self-Supervised Speech Emotion Recognition

Add code
Jun 04, 2026
Viaarxiv icon

Age-Aware Adapter Tuning for Children's Speech Recognition

Add code
Jun 03, 2026
Viaarxiv icon

Beyond WER: A Paired Acoustic Stress Test for Ambient Clinical Scribes

Add code
Jun 04, 2026
Viaarxiv icon

TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum

Add code
Jun 11, 2026
Viaarxiv icon

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

Add code
Jun 04, 2026
Viaarxiv icon