speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Add code
Sep 18, 2025
Viaarxiv icon

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition

Add code
Sep 19, 2025
Viaarxiv icon

Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses

Add code
Sep 17, 2025
Viaarxiv icon

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model

Add code
Sep 19, 2025
Viaarxiv icon

Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations

Add code
Sep 19, 2025
Viaarxiv icon

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?

Add code
Sep 19, 2025
Viaarxiv icon

BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition

Add code
Sep 18, 2025
Viaarxiv icon

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition

Add code
Sep 19, 2025
Viaarxiv icon

PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition

Add code
Sep 16, 2025
Figure 1 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Figure 2 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Figure 3 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Figure 4 for PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Viaarxiv icon