speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment

Add code
Oct 23, 2025
Viaarxiv icon

How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu

Add code
Oct 08, 2025
Viaarxiv icon

EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning

Add code
Oct 02, 2025
Viaarxiv icon

Interpreting the Role of Visemes in Audio-Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses

Add code
Sep 17, 2025
Viaarxiv icon

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Add code
Sep 18, 2025
Figure 1 for UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Figure 2 for UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Figure 3 for UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Figure 4 for UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Viaarxiv icon

State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization

Add code
Sep 19, 2025
Viaarxiv icon

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition

Add code
Sep 19, 2025
Viaarxiv icon

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model

Add code
Sep 19, 2025
Figure 1 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Figure 2 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Figure 3 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Figure 4 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Viaarxiv icon

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations

Add code
Sep 19, 2025
Viaarxiv icon