Picture for Tomoki Koriyama

Tomoki Koriyama

VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features

Add code
Jul 03, 2024
Figure 1 for VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Figure 2 for VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Figure 3 for VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Figure 4 for VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Viaarxiv icon

An Attribute Interpolation Method in Speech Synthesis by Model Merging

Add code
Jun 30, 2024
Viaarxiv icon

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

Add code
Feb 01, 2024
Viaarxiv icon

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Add code
Feb 27, 2023
Viaarxiv icon

Structured State Space Decoder for Speech Recognition and Synthesis

Add code
Oct 31, 2022
Viaarxiv icon

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

Add code
Apr 05, 2022
Figure 1 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Figure 2 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Figure 3 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Figure 4 for UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Viaarxiv icon

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Add code
Aug 07, 2020
Figure 1 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Figure 2 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Figure 3 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Figure 4 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Viaarxiv icon

Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

Add code
Apr 22, 2020
Figure 1 for Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Figure 2 for Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Figure 3 for Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Figure 4 for Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Viaarxiv icon

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

Add code
Feb 09, 2019
Figure 1 for Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking
Figure 2 for Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking
Figure 3 for Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking
Figure 4 for Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking
Viaarxiv icon

Sampling-based speech parameter generation using moment-matching networks

Add code
Apr 12, 2017
Figure 1 for Sampling-based speech parameter generation using moment-matching networks
Figure 2 for Sampling-based speech parameter generation using moment-matching networks
Figure 3 for Sampling-based speech parameter generation using moment-matching networks
Figure 4 for Sampling-based speech parameter generation using moment-matching networks
Viaarxiv icon