speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties

Add code
May 20, 2025
Viaarxiv icon

Differentiable K-means for Fully-optimized Discrete Token-based ASR

Add code
May 22, 2025
Viaarxiv icon

Pretraining Multi-Speaker Identification for Neural Speaker Diarization

Add code
May 30, 2025
Viaarxiv icon

Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora

Add code
May 22, 2025
Viaarxiv icon

SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding

Add code
May 22, 2025
Viaarxiv icon

FiLLM -- A Filipino-optimized Large Language Model based on Southeast Asia Large Language Model (SEALLM)

Add code
May 25, 2025
Viaarxiv icon

Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition

Add code
May 19, 2025
Viaarxiv icon

From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data

Add code
May 20, 2025
Viaarxiv icon

Granary: Speech Recognition and Translation Dataset in 25 European Languages

Add code
May 19, 2025
Viaarxiv icon

Private kNN-VC: Interpretable Anonymization of Converted Speech

Add code
May 23, 2025
Viaarxiv icon