Speech Synthesis


Speech synthesis is the process of generating artificial speech from text using computer algorithms.

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Add code
May 05, 2025
Viaarxiv icon

Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion

Add code
May 03, 2025
Viaarxiv icon

Scaling On-Device GPU Inference for Large Generative Models

Add code
May 01, 2025
Viaarxiv icon

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Add code
Apr 29, 2025
Viaarxiv icon

Towards Flow-Matching-based TTS without Classifier-Free Guidance

Add code
Apr 29, 2025
Viaarxiv icon

Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements

Add code
Apr 27, 2025
Viaarxiv icon

Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget

Add code
Apr 27, 2025
Viaarxiv icon

DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue

Add code
Apr 20, 2025
Viaarxiv icon

FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning

Add code
Apr 22, 2025
Viaarxiv icon

SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation

Add code
Apr 21, 2025
Viaarxiv icon