speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

Add code
Jan 26, 2026
Viaarxiv icon

VIBEVOICE-ASR Technical Report

Add code
Jan 26, 2026
Viaarxiv icon

Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries

Add code
Jan 26, 2026
Viaarxiv icon

Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity

Add code
Jan 26, 2026
Viaarxiv icon

Efficient Rehearsal for Continual Learning in ASR via Singular Value Tuning

Add code
Jan 26, 2026
Viaarxiv icon

Noise-Robust AV-ASR Using Visual Features Both in the Whisper Encoder and Decoder

Add code
Jan 26, 2026
Viaarxiv icon

BanglaRobustNet: A Hybrid Denoising-Attention Architecture for Robust Bangla Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition

Add code
Jan 26, 2026
Viaarxiv icon

SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays

Add code
Jan 25, 2026
Viaarxiv icon