speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Add code
Aug 13, 2025
Viaarxiv icon

Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild

Add code
Aug 10, 2025
Viaarxiv icon

Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults

Add code
Aug 12, 2025
Viaarxiv icon

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription

Add code
Aug 07, 2025
Viaarxiv icon

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree

Add code
Aug 12, 2025
Figure 1 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Figure 2 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Figure 3 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Figure 4 for TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
Viaarxiv icon

Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning

Add code
Aug 12, 2025
Viaarxiv icon

A Small-footprint Acoustic Echo Cancellation Solution for Mobile Full-Duplex Speech Interactions

Add code
Aug 11, 2025
Viaarxiv icon

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

Add code
Aug 25, 2025
Viaarxiv icon

Large Language Model Data Generation for Enhanced Intent Recognition in German Speech

Add code
Aug 08, 2025
Viaarxiv icon

A Survey on Non-Intrusive ASR Refinement: From Output-Level Correction to Full-Model Distillation

Add code
Aug 10, 2025
Viaarxiv icon