speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling

Add code
Feb 09, 2026
Viaarxiv icon

Text-only adaptation in LLM-based ASR through text denoising

Add code
Jan 28, 2026
Viaarxiv icon

Qwen3-ASR Technical Report

Add code
Jan 29, 2026
Viaarxiv icon

Uncertainty-Aware Multimodal Emotion Recognition through Dirichlet Parameterization

Add code
Feb 09, 2026
Viaarxiv icon

ADEPT: RL-Aligned Agentic Decoding of Emotion via Evidence Probing Tools -- From Consensus Learning to Ambiguity-Driven Emotion Reasoning

Add code
Feb 13, 2026
Viaarxiv icon

Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling

Add code
Jan 29, 2026
Viaarxiv icon

Factored Reasoning with Inner Speech and Persistent Memory for Evidence-Grounded Human-Robot Interaction

Add code
Jan 31, 2026
Viaarxiv icon

DementiaBank-Emotion: A Multi-Rater Emotion Annotation Corpus for Alzheimer's Disease Speech (Version 1.0)

Add code
Feb 04, 2026
Viaarxiv icon

A Baseline Multimodal Approach to Emotion Recognition in Conversations

Add code
Jan 31, 2026
Viaarxiv icon

An Effective Energy Mask-based Adversarial Evasion Attacks against Misclassification in Speaker Recognition Systems

Add code
Jan 29, 2026
Viaarxiv icon