speech


Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMs

Add code
Sep 17, 2025
Viaarxiv icon

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

Add code
Sep 17, 2025
Viaarxiv icon

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Add code
Sep 17, 2025
Viaarxiv icon

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace

Add code
Sep 17, 2025
Viaarxiv icon

Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics

Add code
Sep 17, 2025
Viaarxiv icon

Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework

Add code
Sep 17, 2025
Viaarxiv icon

Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality

Add code
Sep 17, 2025
Viaarxiv icon

MICA: Multi-Agent Industrial Coordination Assistant

Add code
Sep 17, 2025
Viaarxiv icon

A Lightweight Fourier-based Network for Binaural Speech Enhancement with Spatial Cue Preservation

Add code
Sep 17, 2025
Viaarxiv icon

A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis

Add code
Sep 16, 2025
Viaarxiv icon