speech


UniTAF: A Modular Framework for Joint Text-to-Speech and Audio-to-Face Modeling

Add code
Feb 17, 2026
Viaarxiv icon

Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios

Add code
Feb 17, 2026
Viaarxiv icon

Joint Enhancement and Classification using Coupled Diffusion Models of Signals and Logits

Add code
Feb 17, 2026
Viaarxiv icon

What Do Neurons Listen To? A Neuron-level Dissection of a General-purpose Audio Model

Add code
Feb 17, 2026
Viaarxiv icon

Clinically Inspired Symptom-Guided Depression Detection from Emotion-Aware Speech Representations

Add code
Feb 17, 2026
Viaarxiv icon

Under-resourced studies of under-resourced languages: lemmatization and POS-tagging with LLM annotators for historical Armenian, Georgian, Greek and Syriac

Add code
Feb 17, 2026
Viaarxiv icon

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Add code
Feb 17, 2026
Viaarxiv icon

MAEB: Massive Audio Embedding Benchmark

Add code
Feb 17, 2026
Viaarxiv icon

Disentangling Pitch and Creak for Speaker Identity Preservation in Speech Synthesis

Add code
Feb 16, 2026
Viaarxiv icon

Data Augmentation for Pathological Speech Enhancement

Add code
Feb 16, 2026
Viaarxiv icon