speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning

Add code
Feb 08, 2026
Viaarxiv icon

Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages

Add code
Feb 01, 2026
Viaarxiv icon

From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding

Add code
Feb 05, 2026
Viaarxiv icon

EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech

Add code
Feb 01, 2026
Viaarxiv icon

Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts

Add code
Jan 31, 2026
Viaarxiv icon

Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition

Add code
Feb 02, 2026
Viaarxiv icon

asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation

Add code
Jan 28, 2026
Viaarxiv icon

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization

Add code
Jan 30, 2026
Viaarxiv icon

VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation

Add code
Feb 06, 2026
Viaarxiv icon

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

Add code
Jan 29, 2026
Viaarxiv icon