Speech


Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates

Add code
Sep 11, 2025
Viaarxiv icon

Acoustic to Articulatory Speech Inversion for Children with Velopharyngeal Insufficiency

Add code
Sep 11, 2025
Viaarxiv icon

Listening for "You": Enhancing Speech Image Retrieval via Target Speaker Extraction

Add code
Sep 11, 2025
Viaarxiv icon

MAPSS: Manifold-based Assessment of Perceptual Source Separation

Add code
Sep 11, 2025
Viaarxiv icon

GmSLM : Generative Marmoset Spoken Language Modeling

Add code
Sep 11, 2025
Viaarxiv icon

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Add code
Sep 11, 2025
Viaarxiv icon

LITcoder: A General-Purpose Library for Building and Comparing Encoding Models

Add code
Sep 11, 2025
Viaarxiv icon

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech

Add code
Sep 11, 2025
Viaarxiv icon

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Add code
Sep 11, 2025
Viaarxiv icon

HISPASpoof: A New Dataset For Spanish Speech Forensics

Add code
Sep 11, 2025
Viaarxiv icon