Alert button
Picture for Jian Cong

Jian Cong

Alert button

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

Add code
Bookmark button
Alert button
Oct 06, 2023
Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yuping Wang, Lei Xie

Viaarxiv icon

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

Add code
Bookmark button
Alert button
Sep 02, 2023
Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie

Figure 1 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 2 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 3 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 4 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Viaarxiv icon

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Add code
Bookmark button
Alert button
Nov 02, 2022
Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang, Haiying Wu

Figure 1 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 2 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 3 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Figure 4 for Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Viaarxiv icon

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

Add code
Bookmark button
Alert button
Nov 02, 2022
Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, Jinfeng Bai

Figure 1 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 2 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 3 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Figure 4 for DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Viaarxiv icon

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

Add code
Bookmark button
Alert button
Jul 05, 2022
Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su

Figure 1 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 2 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 3 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 4 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Viaarxiv icon

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Add code
Bookmark button
Alert button
Jun 01, 2022
Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan Su

Figure 1 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 2 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 3 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Figure 4 for AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Viaarxiv icon

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Add code
Bookmark button
Alert button
May 10, 2022
Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

Figure 1 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 2 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 3 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Figure 4 for NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Viaarxiv icon

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

Add code
Bookmark button
Alert button
Oct 17, 2021
Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

Figure 1 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Figure 2 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Figure 3 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Figure 4 for VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Viaarxiv icon

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

Add code
Bookmark button
Alert button
Jun 22, 2021
Jian Cong, Shan Yang, Lei Xie, Dan Su

Figure 1 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 2 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 3 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 4 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Viaarxiv icon