Picture for Jian Cong

Jian Cong

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Viaarxiv icon

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

Add code
Oct 06, 2023
Figure 1 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 2 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 3 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 4 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Viaarxiv icon

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

Add code
Sep 02, 2023
Figure 1 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 2 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 3 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 4 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Viaarxiv icon

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Add code
Nov 02, 2022
Figure 1 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 2 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 3 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 4 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Viaarxiv icon

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Add code
Nov 02, 2022
Figure 1 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 2 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 3 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 4 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Viaarxiv icon

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

Add code
Jul 05, 2022
Figure 1 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 2 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 3 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 4 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Viaarxiv icon

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Add code
Jun 01, 2022
Figure 1 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 2 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 3 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 4 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Viaarxiv icon

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Add code
May 10, 2022
Figure 1 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 2 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 3 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 4 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Viaarxiv icon

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

Add code
Oct 17, 2021
Figure 1 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Figure 2 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Figure 3 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Figure 4 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Viaarxiv icon

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

Add code
Jun 22, 2021
Figure 1 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 2 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 3 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 4 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Viaarxiv icon