speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study

Add code
Aug 25, 2025
Figure 1 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Figure 2 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Figure 3 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Figure 4 for Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study
Viaarxiv icon

What do Speech Foundation Models Learn? Analysis and Applications

Add code
Aug 17, 2025
Viaarxiv icon

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription

Add code
Aug 07, 2025
Figure 1 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Figure 2 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Figure 3 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Figure 4 for SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Viaarxiv icon

Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild

Add code
Aug 10, 2025
Viaarxiv icon

EmoTale: An Enacted Speech-emotion Dataset in Danish

Add code
Aug 20, 2025
Figure 1 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Figure 2 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Figure 3 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Figure 4 for EmoTale: An Enacted Speech-emotion Dataset in Danish
Viaarxiv icon

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Add code
Sep 18, 2025
Viaarxiv icon

Revealing the Role of Audio Channels in ASR Performance Degradation

Add code
Aug 12, 2025
Figure 1 for Revealing the Role of Audio Channels in ASR Performance Degradation
Figure 2 for Revealing the Role of Audio Channels in ASR Performance Degradation
Figure 3 for Revealing the Role of Audio Channels in ASR Performance Degradation
Figure 4 for Revealing the Role of Audio Channels in ASR Performance Degradation
Viaarxiv icon

CarelessWhisper: Turning Whisper into a Causal Streaming Model

Add code
Aug 17, 2025
Viaarxiv icon

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Add code
Aug 13, 2025
Viaarxiv icon

Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults

Add code
Aug 12, 2025
Figure 1 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Figure 2 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Figure 3 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Figure 4 for Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults
Viaarxiv icon