speech


T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS

Add code
Jan 27, 2026
Viaarxiv icon

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

Add code
Jan 27, 2026
Viaarxiv icon

SLM-SS: Speech Language Model for Generative Speech Separation

Add code
Jan 27, 2026
Viaarxiv icon

Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition

Add code
Jan 27, 2026
Viaarxiv icon

SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper

Add code
Jan 27, 2026
Viaarxiv icon

Beyond Lips: Integrating Gesture and Lip Cues for Robust Audio-visual Speaker Extraction

Add code
Jan 27, 2026
Viaarxiv icon

A Hybrid Discriminative and Generative System for Universal Speech Enhancement

Add code
Jan 27, 2026
Viaarxiv icon

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback

Add code
Jan 27, 2026
Viaarxiv icon

Residual Tokens Enhance Masked Autoencoders for Speech Modeling

Add code
Jan 27, 2026
Viaarxiv icon

Distillation-based Layer Dropping (DLD): Effective End-to-end Framework for Dynamic Speech Networks

Add code
Jan 27, 2026
Viaarxiv icon