Picture for Karolos Nikitaras

Karolos Nikitaras

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

Add code
Nov 02, 2022
Figure 1 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 2 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 3 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Figure 4 for Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Viaarxiv icon

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Add code
Nov 01, 2022
Figure 1 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 2 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 3 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Figure 4 for Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Viaarxiv icon

Generating Gender-Ambiguous Text-to-Speech Voices

Add code
Nov 01, 2022
Figure 1 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 2 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 3 for Generating Gender-Ambiguous Text-to-Speech Voices
Figure 4 for Generating Gender-Ambiguous Text-to-Speech Voices
Viaarxiv icon

Fine-grained Noise Control for Multispeaker Speech Synthesis

Add code
Apr 11, 2022
Figure 1 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 2 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 3 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Figure 4 for Fine-grained Noise Control for Multispeaker Speech Synthesis
Viaarxiv icon

Self supervised learning for robust voice cloning

Add code
Apr 07, 2022
Figure 1 for Self supervised learning for robust voice cloning
Figure 2 for Self supervised learning for robust voice cloning
Figure 3 for Self supervised learning for robust voice cloning
Viaarxiv icon

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

Add code
Apr 06, 2022
Figure 1 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 2 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 3 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Figure 4 for SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Viaarxiv icon