speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning

Add code
Feb 13, 2026
Viaarxiv icon

Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models

Add code
Feb 01, 2026
Viaarxiv icon

OCR-Enhanced Multimodal ASR Can Read While Listening

Add code
Jan 26, 2026
Viaarxiv icon

Semantics-Aware Generative Latent Data Augmentation for Learning in Low-Resource Domains

Add code
Feb 02, 2026
Viaarxiv icon

BanglaRobustNet: A Hybrid Denoising-Attention Architecture for Robust Bangla Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization

Add code
Feb 09, 2026
Viaarxiv icon

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

Add code
Jan 30, 2026
Viaarxiv icon

Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

Add code
Jan 26, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

VIBEVOICE-ASR Technical Report

Add code
Jan 26, 2026
Viaarxiv icon