speech


MORE: Multi-Objective Adversarial Attacks on Speech Recognition

Add code
Jan 05, 2026
Viaarxiv icon

Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications

Add code
Jan 05, 2026
Viaarxiv icon

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR

Add code
Jan 04, 2026
Viaarxiv icon

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech

Add code
Jan 04, 2026
Viaarxiv icon

SmoothSync: Dual-Stream Diffusion Transformers for Jitter-Robust Beat-Synchronized Gesture Generation from Quantized Audio

Add code
Jan 04, 2026
Viaarxiv icon

UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

Add code
Jan 04, 2026
Viaarxiv icon

LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

Add code
Jan 04, 2026
Viaarxiv icon

FC-CONAN: An Exhaustively Paired Dataset for Robust Evaluation of Retrieval Systems

Add code
Jan 04, 2026
Viaarxiv icon

IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection

Add code
Jan 03, 2026
Viaarxiv icon

Listen, Attend, Understand: a Regularization Technique for Stable E2E Speech Translation Training on High Variance labels

Add code
Jan 03, 2026
Viaarxiv icon