speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Add code
May 20, 2025
Viaarxiv icon

Differentiable K-means for Fully-optimized Discrete Token-based ASR

Add code
May 22, 2025
Viaarxiv icon

An End-to-End Approach for Child Reading Assessment in the Xhosa Language

Add code
May 23, 2025
Viaarxiv icon

Token-Level Logits Matter: A Closer Look at Speech Foundation Models for Ambiguous Emotion Recognition

Add code
May 24, 2025
Viaarxiv icon

Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition

Add code
May 29, 2025
Viaarxiv icon

Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora

Add code
May 22, 2025
Viaarxiv icon

SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding

Add code
May 22, 2025
Viaarxiv icon

Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network

Add code
May 20, 2025
Viaarxiv icon

HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing

Add code
May 20, 2025
Viaarxiv icon

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach

Add code
May 21, 2025
Viaarxiv icon