Speech Synthesis


Speech synthesis is the process of generating artificial speech from text using computer algorithms.

ARCHI-TTS: A flow-matching-based Text-to-Speech Model with Self-supervised Semantic Aligner and Accelerated Inference

Add code
Feb 05, 2026
Viaarxiv icon

Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track

Add code
Feb 05, 2026
Viaarxiv icon

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement

Add code
Feb 04, 2026
Viaarxiv icon

VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech Synthesis

Add code
Feb 01, 2026
Viaarxiv icon

EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech

Add code
Feb 01, 2026
Viaarxiv icon

CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering

Add code
Feb 03, 2026
Viaarxiv icon

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Add code
Jan 30, 2026
Viaarxiv icon

Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts

Add code
Jan 29, 2026
Viaarxiv icon

Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

Add code
Jan 31, 2026
Viaarxiv icon

Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems

Add code
Jan 29, 2026
Viaarxiv icon