Picture for Yongmao Zhang

Yongmao Zhang

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Add code
Jun 12, 2024
Viaarxiv icon

Accent-VITS:accent transfer for end-to-end TTS

Add code
Dec 29, 2023
Viaarxiv icon

PromptSpeaker: Speaker Generation Based on Text Descriptions

Add code
Oct 08, 2023
Viaarxiv icon

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

Add code
Jul 29, 2023
Figure 1 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Figure 2 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Figure 3 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Figure 4 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Viaarxiv icon

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

Add code
Jul 10, 2023
Figure 1 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Figure 2 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Figure 3 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Figure 4 for The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Viaarxiv icon

PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions

Add code
Jun 01, 2023
Figure 1 for PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Figure 2 for PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Figure 3 for PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Figure 4 for PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Viaarxiv icon

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling

Add code
Nov 19, 2022
Figure 1 for Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Figure 2 for Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Figure 3 for Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Figure 4 for Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling
Viaarxiv icon

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

Add code
Nov 05, 2022
Figure 1 for VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Figure 2 for VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Figure 3 for VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Figure 4 for VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Viaarxiv icon

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Add code
Nov 02, 2022
Figure 1 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 2 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 3 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 4 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Viaarxiv icon

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Add code
Nov 02, 2022
Figure 1 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 2 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 3 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 4 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Viaarxiv icon