Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Singh

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Oct 18, 2022

Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji

Figure 1 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Figure 2 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Figure 3 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Figure 4 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Abstract:Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, generating a high-quality singing voice remains challenging due to a wider variety of musical expressions in pitch, loudness, and pronunciations. In this work, we propose a hierarchical diffusion model for singing voice neural vocoders. The proposed method consists of multiple diffusion models operating in different sampling rates; the model at the lowest sampling rate focuses on generating accurate low-frequency components such as pitch, and other models progressively generate the waveform at higher sampling rates on the basis of the data at the lower sampling rate and acoustic features. Experimental results show that the proposed method produces high-quality singing voices for multiple singers, outperforming state-of-the-art neural vocoders with a similar range of computational costs.

Via

Access Paper or Ask Questions