speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

Add code
Mar 09, 2026
Viaarxiv icon

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

Add code
Mar 11, 2026
Viaarxiv icon

Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study

Add code
Mar 02, 2026
Viaarxiv icon

Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation

Add code
Mar 11, 2026
Viaarxiv icon

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow

Add code
Mar 11, 2026
Viaarxiv icon

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Add code
Mar 04, 2026
Viaarxiv icon

Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition

Add code
Mar 05, 2026
Viaarxiv icon

Duration Aware Scheduling for ASR Serving Under Workload Drift

Add code
Mar 11, 2026
Viaarxiv icon

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Add code
Mar 05, 2026
Viaarxiv icon

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Add code
Mar 10, 2026
Viaarxiv icon