TTS


OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Add code
Apr 02, 2026
Viaarxiv icon

Captioning Daily Activity Images in Early Childhood Education: Benchmark and Algorithm

Add code
Apr 02, 2026
Viaarxiv icon

T5Gemma-TTS Technical Report

Add code
Apr 02, 2026
Viaarxiv icon

MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control

Add code
Mar 31, 2026
Viaarxiv icon

The Thiomi Dataset: A Large-Scale Multimodal Corpus for Low-Resource African Languages

Add code
Mar 31, 2026
Viaarxiv icon

Combining Masked Language Modeling and Cross-Modal Contrastive Learning for Prosody-Aware TTS

Add code
Mar 31, 2026
Viaarxiv icon

LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space

Add code
Mar 31, 2026
Viaarxiv icon

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining

Add code
Mar 30, 2026
Viaarxiv icon

Voxtral TTS

Add code
Mar 26, 2026
Viaarxiv icon

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools for Romanian

Add code
Mar 25, 2026
Viaarxiv icon