speech


Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems

Add code
Feb 26, 2026
Viaarxiv icon

Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper

Add code
Feb 26, 2026
Viaarxiv icon

A Mixture-of-Experts Model for Multimodal Emotion Recognition in Conversations

Add code
Feb 26, 2026
Viaarxiv icon

Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment

Add code
Feb 26, 2026
Viaarxiv icon

Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech

Add code
Feb 26, 2026
Viaarxiv icon

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing

Add code
Feb 26, 2026
Viaarxiv icon

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

Add code
Feb 26, 2026
Viaarxiv icon

Moving Speaker Separation via Parallel Spectral-Spatial Processing

Add code
Feb 25, 2026
Viaarxiv icon

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition

Add code
Feb 25, 2026
Viaarxiv icon

OmniZip: Learning a Unified and Lightweight Lossless Compressor for Multi-Modal Data

Add code
Feb 25, 2026
Viaarxiv icon