speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

Add code
May 28, 2026
Viaarxiv icon

MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion

Add code
May 28, 2026
Viaarxiv icon

TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition

Add code
May 27, 2026
Viaarxiv icon

Decentralized LLM-Driven Coordination of Acoustic Robots for Contactless Object Manipulation

Add code
May 28, 2026
Viaarxiv icon

Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty

Add code
May 30, 2026
Viaarxiv icon

Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

Add code
May 27, 2026
Viaarxiv icon

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Add code
May 27, 2026
Viaarxiv icon

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Add code
May 31, 2026
Viaarxiv icon

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Add code
May 28, 2026
Viaarxiv icon

FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of Parliamentary Sessions

Add code
May 26, 2026
Viaarxiv icon