speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition

Add code
Sep 19, 2025
Figure 1 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 2 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 3 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 4 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Viaarxiv icon

Emotion-Aware Speech Generation with Character-Specific Voices for Comics

Add code
Sep 18, 2025
Viaarxiv icon

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Add code
Sep 11, 2025
Viaarxiv icon

Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model

Add code
Sep 10, 2025
Viaarxiv icon

Identifying and Calibrating Overconfidence in Noisy Speech Recognition

Add code
Sep 08, 2025
Viaarxiv icon

Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition

Add code
Sep 10, 2025
Viaarxiv icon

Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling

Add code
Sep 10, 2025
Viaarxiv icon

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Add code
Sep 18, 2025
Viaarxiv icon

A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR

Add code
Sep 09, 2025
Viaarxiv icon

EnvX: Agentize Everything with Agentic AI

Add code
Sep 09, 2025
Figure 1 for EnvX: Agentize Everything with Agentic AI
Figure 2 for EnvX: Agentize Everything with Agentic AI
Figure 3 for EnvX: Agentize Everything with Agentic AI
Viaarxiv icon