speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling

Add code
Feb 09, 2026
Viaarxiv icon

Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models

Add code
Feb 01, 2026
Viaarxiv icon

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

Add code
Jan 30, 2026
Viaarxiv icon

Semantics-Aware Generative Latent Data Augmentation for Learning in Low-Resource Domains

Add code
Feb 02, 2026
Viaarxiv icon

Text-only adaptation in LLM-based ASR through text denoising

Add code
Jan 28, 2026
Viaarxiv icon

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

Add code
Jan 28, 2026
Viaarxiv icon

Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization

Add code
Feb 09, 2026
Viaarxiv icon

ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning

Add code
Feb 13, 2026
Viaarxiv icon

Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR

Add code
Jan 28, 2026
Viaarxiv icon

Qwen3-ASR Technical Report

Add code
Jan 29, 2026
Viaarxiv icon