speech


Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Add code
Apr 21, 2026
Viaarxiv icon

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Add code
Apr 21, 2026
Viaarxiv icon

Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model

Add code
Apr 21, 2026
Viaarxiv icon

Achieving Interaction Fluidity in a Wizard-of-Oz Robotic System: A Prototype for Fluid Error-Correction

Add code
Apr 21, 2026
Viaarxiv icon

UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

Add code
Apr 21, 2026
Viaarxiv icon

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India

Add code
Apr 21, 2026
Viaarxiv icon

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps

Add code
Apr 21, 2026
Viaarxiv icon

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

Add code
Apr 21, 2026
Viaarxiv icon

Tadabur: A Large-Scale Quran Audio Dataset

Add code
Apr 21, 2026
Viaarxiv icon

Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean

Add code
Apr 21, 2026
Viaarxiv icon