speech


Face-to-Face: A Video Dataset for Multi-Person Interaction Modeling

Add code
Mar 16, 2026
Viaarxiv icon

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Add code
Mar 16, 2026
Viaarxiv icon

spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender

Add code
Mar 16, 2026
Viaarxiv icon

Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

Add code
Mar 16, 2026
Viaarxiv icon

FreeTalk: Emotional Topology-Free 3D Talking Heads

Add code
Mar 16, 2026
Viaarxiv icon

Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning

Add code
Mar 16, 2026
Viaarxiv icon

SoulX-Duplug: Plug-and-Play Streaming State Prediction Module for Realtime Full-Duplex Speech Conversation

Add code
Mar 16, 2026
Viaarxiv icon

BROTHER: Behavioral Recognition Optimized Through Heterogeneous Ensemble Regularization for Ambivalence and Hesitancy

Add code
Mar 15, 2026
Viaarxiv icon

Localizing and Editing Knowledge in Large Audio-Language Models

Add code
Mar 15, 2026
Viaarxiv icon

Controllable Accent Normalization via Discrete Diffusion

Add code
Mar 15, 2026
Viaarxiv icon