speech


MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free

Add code
Jan 08, 2026
Viaarxiv icon

TellWhisper: Tell Whisper Who Speaks When

Add code
Jan 08, 2026
Viaarxiv icon

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

Add code
Jan 08, 2026
Viaarxiv icon

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis

Add code
Jan 07, 2026
Viaarxiv icon

Self-Explaining Hate Speech Detection with Moral Rationales

Add code
Jan 07, 2026
Viaarxiv icon

Stuttering-Aware Automatic Speech Recognition for Indonesian Language

Add code
Jan 07, 2026
Viaarxiv icon

Lightweight and perceptually-guided voice conversion for electro-laryngeal speech

Add code
Jan 07, 2026
Viaarxiv icon

ASVspoof 5: Evaluation of Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

Add code
Jan 07, 2026
Viaarxiv icon

SpeakerSleuth: Evaluating Large Audio-Language Models as Judges for Multi-turn Speaker Consistency

Add code
Jan 07, 2026
Viaarxiv icon

MiJaBench: Revealing Minority Biases in Large Language Models via Hate Speech Jailbreaking

Add code
Jan 07, 2026
Viaarxiv icon