speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition

Add code
May 27, 2026
Viaarxiv icon

Decentralized LLM-Driven Coordination of Acoustic Robots for Contactless Object Manipulation

Add code
May 28, 2026
Viaarxiv icon

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Add code
May 31, 2026
Viaarxiv icon

Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

Add code
May 27, 2026
Viaarxiv icon

Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

Add code
May 27, 2026
Viaarxiv icon

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding

Add code
May 28, 2026
Viaarxiv icon

Recognizing Co-Speech Gestures in-the-Wild

Add code
May 29, 2026
Viaarxiv icon

FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of Parliamentary Sessions

Add code
May 26, 2026
Viaarxiv icon

UNIQUE: Universal Top-k Sparse Attention for Training-free Inference and Sparsity-aware Training

Add code
May 26, 2026
Viaarxiv icon

Proactive for Uncertainty: Cause-Aware Error Diagnosis and Interactive Clarification for Spoken Dialogue Systems

Add code
May 25, 2026
Viaarxiv icon