speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers

Add code
Jan 15, 2026
Viaarxiv icon

AI-based System for Transforming text and sound to Educational Videos

Add code
Jan 16, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Add code
Jan 14, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances

Add code
Jan 13, 2026
Viaarxiv icon

Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition

Add code
Jan 11, 2026
Viaarxiv icon

Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects

Add code
Jan 12, 2026
Viaarxiv icon

Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition

Add code
Jan 11, 2026
Viaarxiv icon

Task Arithmetic with Support Languages for Low-Resource ASR

Add code
Jan 11, 2026
Viaarxiv icon