speech


Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

Add code
Sep 15, 2025
Viaarxiv icon

Acoustic to Articulatory Speech Inversion for Children with Velopharyngeal Insufficiency

Add code
Sep 11, 2025
Viaarxiv icon

Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates

Add code
Sep 11, 2025
Viaarxiv icon

Listening for "You": Enhancing Speech Image Retrieval via Target Speaker Extraction

Add code
Sep 11, 2025
Viaarxiv icon

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Add code
Sep 11, 2025
Viaarxiv icon

DiFlow-TTS: Discrete Flow Matching with Factorized Speech Tokens for Low-Latency Zero-Shot Text-To-Speech

Add code
Sep 11, 2025
Viaarxiv icon

GmSLM : Generative Marmoset Spoken Language Modeling

Add code
Sep 11, 2025
Viaarxiv icon

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Add code
Sep 11, 2025
Viaarxiv icon

LITcoder: A General-Purpose Library for Building and Comparing Encoding Models

Add code
Sep 11, 2025
Viaarxiv icon

Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems

Add code
Sep 11, 2025
Viaarxiv icon