speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition

Add code
Aug 11, 2025
Viaarxiv icon

Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation

Add code
Aug 25, 2025
Viaarxiv icon

UniCoM: A Universal Code-Switching Speech Generator

Add code
Aug 21, 2025
Viaarxiv icon

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition

Add code
Aug 12, 2025
Figure 1 for DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
Figure 2 for DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
Figure 3 for DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
Figure 4 for DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
Viaarxiv icon

What do Speech Foundation Models Learn? Analysis and Applications

Add code
Aug 17, 2025
Viaarxiv icon

Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study

Add code
Aug 25, 2025
Figure 1 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Figure 2 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Figure 3 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Figure 4 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Viaarxiv icon

EmoTale: An Enacted Speech-emotion Dataset in Danish

Add code
Aug 20, 2025
Figure 1 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Figure 2 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Figure 3 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Figure 4 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Viaarxiv icon

Revealing the Role of Audio Channels in ASR Performance Degradation

Add code
Aug 12, 2025
Figure 1 for Revealing the Role of Audio Channels in ASR Performance Degradation
Figure 2 for Revealing the Role of Audio Channels in ASR Performance Degradation
Figure 3 for Revealing the Role of Audio Channels in ASR Performance Degradation
Figure 4 for Revealing the Role of Audio Channels in ASR Performance Degradation
Viaarxiv icon

CarelessWhisper: Turning Whisper into a Causal Streaming Model

Add code
Aug 17, 2025
Viaarxiv icon

Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults

Add code
Aug 12, 2025
Figure 1 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Figure 2 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Figure 3 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Figure 4 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Viaarxiv icon