Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Tacotron: Towards End-to-End Speech Synthesis

Apr 06, 2017

Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio(+4 more)

Figure 1 for Tacotron: Towards End-to-End Speech Synthesis

Figure 2 for Tacotron: Towards End-to-End Speech Synthesis

Figure 3 for Tacotron: Towards End-to-End Speech Synthesis

Figure 4 for Tacotron: Towards End-to-End Speech Synthesis

Share this with someone who'll enjoy it:

Abstract:A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle design choices. In this paper, we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters. Given <text, audio> pairs, the model can be trained completely from scratch with random initialization. We present several key techniques to make the sequence-to-sequence framework perform well for this challenging task. Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, outperforming a production parametric system in terms of naturalness. In addition, since Tacotron generates speech at the frame level, it's substantially faster than sample-level autoregressive methods.

* Submitted to Interspeech 2017. v2 changed paper title to be consistent with our conference submission (no content change other than typo fixes)

View paper on

Share this with someone who'll enjoy it:

Title:Tacotron: Towards End-to-End Speech Synthesis

Paper and Code