speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data

Add code
Mar 09, 2026
Viaarxiv icon

On the Emotion Understanding of Synthesized Speech

Add code
Mar 17, 2026
Viaarxiv icon

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Add code
Mar 17, 2026
Viaarxiv icon

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

Add code
Mar 09, 2026
Viaarxiv icon

Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?

Add code
Mar 11, 2026
Viaarxiv icon

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

Add code
Mar 11, 2026
Viaarxiv icon

Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition

Add code
Mar 05, 2026
Viaarxiv icon

Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data

Add code
Mar 11, 2026
Viaarxiv icon

Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation

Add code
Mar 11, 2026
Viaarxiv icon

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow

Add code
Mar 11, 2026
Viaarxiv icon