Picture for Byoung Jin Choi

Byoung Jin Choi

MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

Add code
Jun 10, 2024
Viaarxiv icon

Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction

Add code
Jan 03, 2024
Viaarxiv icon

Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction

Add code
Nov 08, 2023
Figure 1 for Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Figure 2 for Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Figure 3 for Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Figure 4 for Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Viaarxiv icon

SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

Add code
Nov 30, 2022
Figure 1 for SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Figure 2 for SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Viaarxiv icon

Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech

Add code
Oct 12, 2022
Figure 1 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 2 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 3 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Figure 4 for Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Viaarxiv icon

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

Add code
Mar 29, 2022
Figure 1 for Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Figure 2 for Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Figure 3 for Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Viaarxiv icon

Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

Add code
Apr 03, 2021
Figure 1 for Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Figure 2 for Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Figure 3 for Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Figure 4 for Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Viaarxiv icon

Expressive Text-to-Speech using Style Tag

Add code
Apr 01, 2021
Figure 1 for Expressive Text-to-Speech using Style Tag
Figure 2 for Expressive Text-to-Speech using Style Tag
Figure 3 for Expressive Text-to-Speech using Style Tag
Figure 4 for Expressive Text-to-Speech using Style Tag
Viaarxiv icon

WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

Add code
Jul 02, 2020
Figure 1 for WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
Figure 2 for WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
Figure 3 for WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
Figure 4 for WaveNODE: A Continuous Normalizing Flow for Speech Synthesis
Viaarxiv icon