speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Add code
Aug 13, 2025
Viaarxiv icon

Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild

Add code
Aug 10, 2025
Viaarxiv icon

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription

Add code
Aug 07, 2025
Viaarxiv icon

Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults

Add code
Aug 12, 2025
Viaarxiv icon

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree

Add code
Aug 12, 2025
Figure 1 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Figure 2 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Figure 3 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Figure 4 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Viaarxiv icon

Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning

Add code
Aug 12, 2025
Viaarxiv icon

A Small-footprint Acoustic Echo Cancellation Solution for Mobile Full-Duplex Speech Interactions

Add code
Aug 11, 2025
Viaarxiv icon

Large Language Model Data Generation for Enhanced Intent Recognition in German Speech

Add code
Aug 08, 2025
Viaarxiv icon

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

Add code
Aug 25, 2025
Viaarxiv icon

A Survey on Non-Intrusive ASR Refinement: From Output-Level Correction to Full-Model Distillation

Add code
Aug 10, 2025
Viaarxiv icon