speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching

Add code
Oct 09, 2025
Figure 1 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 2 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 3 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 4 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Viaarxiv icon

EnvX: Agentize Everything with Agentic AI

Add code
Sep 09, 2025
Figure 1 for EnvX: Agentize Everything with Agentic AI
Figure 2 for EnvX: Agentize Everything with Agentic AI
Figure 3 for EnvX: Agentize Everything with Agentic AI
Viaarxiv icon

PARCO: Phoneme-Augmented Robust Contextual ASR via Contrastive Entity Disambiguation

Add code
Sep 04, 2025
Viaarxiv icon

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech

Add code
Aug 25, 2025
Viaarxiv icon

Cloning a Conversational Voice AI Agent from Call\,Recording Datasets for Telesales

Add code
Sep 05, 2025
Viaarxiv icon

Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects

Add code
Sep 04, 2025
Viaarxiv icon

Evaluating the Representation of Vowels in Wav2Vec Feature Extractor: A Layer-Wise Analysis Using MFCCs

Add code
Aug 25, 2025
Viaarxiv icon

Emotion-Aware Speech Generation with Character-Specific Voices for Comics

Add code
Sep 18, 2025
Figure 1 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 2 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 3 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Figure 4 for Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Viaarxiv icon

Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition

Add code
Sep 19, 2025
Figure 1 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 2 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 3 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Figure 4 for Interpretable Modeling of Articulatory Temporal Dynamics from real-time MRI for Phoneme Recognition
Viaarxiv icon

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Add code
Aug 27, 2025
Viaarxiv icon