speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

Add code
Sep 17, 2025
Viaarxiv icon

Are Multimodal Foundation Models All That Is Needed for Emofake Detection?

Add code
Sep 19, 2025
Viaarxiv icon

Identifying and Calibrating Overconfidence in Noisy Speech Recognition

Add code
Sep 08, 2025
Viaarxiv icon

Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model

Add code
Sep 10, 2025
Figure 1 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 2 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 3 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Figure 4 for Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
Viaarxiv icon

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Add code
Sep 11, 2025
Viaarxiv icon

Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition

Add code
Sep 10, 2025
Viaarxiv icon

Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition

Add code
Sep 19, 2025
Figure 1 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 2 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 3 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 4 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Viaarxiv icon

Emotion-Aware Speech Generation with Character-Specific Voices for Comics

Add code
Sep 18, 2025
Figure 1 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 2 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 3 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 4 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Viaarxiv icon

Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling

Add code
Sep 10, 2025
Viaarxiv icon

A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR

Add code
Sep 09, 2025
Viaarxiv icon