speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Sink or SWIM: Tackling Real-Time ASR at Scale

Add code
Jan 22, 2026
Viaarxiv icon

Deaf and Hard of Hearing Access to Intelligent Personal Assistants: Comparison of Voice-Based Options with an LLM-Powered Touch Interface

Add code
Jan 21, 2026
Viaarxiv icon

A Baseline Multimodal Approach to Emotion Recognition in Conversations

Add code
Jan 31, 2026
Viaarxiv icon

SoundBreak: A Systematic Study of Audio-Only Adversarial Attacks on Trimodal Models

Add code
Jan 20, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers

Add code
Jan 15, 2026
Viaarxiv icon

Lost in Transcription: How Speech-to-Text Errors Derail Code Understanding

Add code
Jan 20, 2026
Viaarxiv icon

PRiSM: Benchmarking Phone Realization in Speech Models

Add code
Jan 20, 2026
Viaarxiv icon

CTC-DID: CTC-Based Arabic dialect identification for streaming applications

Add code
Jan 18, 2026
Viaarxiv icon

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Add code
Jan 16, 2026
Viaarxiv icon