speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Contextualized Token Discrimination for Speech Search Query Correction

Add code
Sep 04, 2025
Figure 1 for Contextualized Token Discrimination for Speech Search Query Correction
Figure 2 for Contextualized Token Discrimination for Speech Search Query Correction
Figure 3 for Contextualized Token Discrimination for Speech Search Query Correction
Figure 4 for Contextualized Token Discrimination for Speech Search Query Correction
Viaarxiv icon

Designing Practical Models for Isolated Word Visual Speech Recognition

Add code
Aug 25, 2025
Viaarxiv icon

EnvX: Agentize Everything with Agentic AI

Add code
Sep 09, 2025
Figure 1 for EnvX: Agentize Everything with Agentic AI
Figure 2 for EnvX: Agentize Everything with Agentic AI
Figure 3 for EnvX: Agentize Everything with Agentic AI
Viaarxiv icon

Speech Emotion Recognition via Entropy-Aware Score Selection

Add code
Aug 28, 2025
Viaarxiv icon

PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation

Add code
Sep 04, 2025
Viaarxiv icon

Cloning a Conversational Voice AI Agent from Call\,Recording Datasets for Telesales

Add code
Sep 05, 2025
Viaarxiv icon

Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects

Add code
Sep 04, 2025
Viaarxiv icon

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech

Add code
Aug 25, 2025
Viaarxiv icon

Emotion-Aware Speech Generation with Character-Specific Voices for Comics

Add code
Sep 18, 2025
Figure 1 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 2 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 3 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 4 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Viaarxiv icon

Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition

Add code
Sep 19, 2025
Figure 1 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 2 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 3 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 4 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Viaarxiv icon