Picture for Hiroshi Saruwatari

Hiroshi Saruwatari

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

Add code
Oct 14, 2022
Figure 1 for Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Figure 2 for Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Figure 3 for Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Figure 4 for Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Viaarxiv icon

Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders

Add code
Sep 27, 2022
Figure 1 for Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
Figure 2 for Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
Figure 3 for Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
Figure 4 for Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
Viaarxiv icon

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

Add code
Sep 26, 2022
Figure 1 for Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Figure 2 for Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Figure 3 for Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Viaarxiv icon

Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning

Add code
Jul 22, 2022
Figure 1 for Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning
Figure 2 for Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning
Figure 3 for Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning
Viaarxiv icon

Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation

Add code
Jul 22, 2022
Figure 1 for Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation
Figure 2 for Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation
Figure 3 for Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation
Viaarxiv icon

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations

Add code
Jun 21, 2022
Figure 1 for Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Figure 2 for Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Figure 3 for Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Figure 4 for Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Viaarxiv icon

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

Add code
Jun 21, 2022
Figure 1 for Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Figure 2 for Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Figure 3 for Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Figure 4 for Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Viaarxiv icon

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

Add code
Jun 16, 2022
Figure 1 for Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Figure 2 for Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Figure 3 for Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Figure 4 for Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Viaarxiv icon

Region-to-region kernel interpolation of acoustic transfer function with directional weighting

Add code
May 05, 2022
Figure 1 for Region-to-region kernel interpolation of acoustic transfer function with directional weighting
Figure 2 for Region-to-region kernel interpolation of acoustic transfer function with directional weighting
Figure 3 for Region-to-region kernel interpolation of acoustic transfer function with directional weighting
Figure 4 for Region-to-region kernel interpolation of acoustic transfer function with directional weighting
Viaarxiv icon

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

Add code
Apr 22, 2022
Figure 1 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Figure 2 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Figure 3 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Figure 4 for Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation
Viaarxiv icon