speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment

Add code
Feb 26, 2026
Viaarxiv icon

NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

Add code
Mar 11, 2026
Viaarxiv icon

DashengTokenizer: One layer is enough for unified audio understanding and generation

Add code
Feb 27, 2026
Viaarxiv icon

Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration

Add code
Feb 25, 2026
Viaarxiv icon

An Approach to Combining Video and Speech with Large Language Models in Human-Robot Interaction

Add code
Feb 23, 2026
Viaarxiv icon

iMiGUE-Speech: A Spontaneous Speech Dataset for Affective Analysis

Add code
Feb 25, 2026
Viaarxiv icon

Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation

Add code
Feb 21, 2026
Viaarxiv icon

BROTHER: Behavioral Recognition Optimized Through Heterogeneous Ensemble Regularization for Ambivalence and Hesitancy

Add code
Mar 15, 2026
Viaarxiv icon

The Patrologia Graeca Corpus: OCR, Annotation, and Open Release of Noisy Nineteenth-Century Polytonic Greek Editions

Add code
Mar 10, 2026
Viaarxiv icon

Voice-Driven Semantic Perception for UAV-Assisted Emergency Networks

Add code
Feb 19, 2026
Viaarxiv icon