Alert button
Picture for RJ Skerry-Ryan

RJ Skerry-Ryan

Alert button

LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

May 24, 2023
Eliya Nachmani, Alon Levkovitch, Julian Salazar, Chulayutsh Asawaroengchai, Soroosh Mariooryad, RJ Skerry-Ryan, Michelle Tadmor Ramanovich

Figure 1 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Figure 2 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Figure 3 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Figure 4 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Viaarxiv icon

Learning the joint distribution of two sequences using little or no paired data

Dec 06, 2022
Soroosh Mariooryad, Matt Shannon, Siyuan Ma, Tom Bagby, David Kao, Daisy Stanton, Eric Battenberg, RJ Skerry-Ryan

Figure 1 for Learning the joint distribution of two sequences using little or no paired data
Figure 2 for Learning the joint distribution of two sequences using little or no paired data
Figure 3 for Learning the joint distribution of two sequences using little or no paired data
Figure 4 for Learning the joint distribution of two sequences using little or no paired data
Viaarxiv icon

Speaker Generation

Nov 07, 2021
Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao

Figure 1 for Speaker Generation
Figure 2 for Speaker Generation
Figure 3 for Speaker Generation
Figure 4 for Speaker Generation
Viaarxiv icon

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Apr 13, 2021
Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, RJ Skerry-Ryan, Yonghui Wu

Figure 1 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Figure 2 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Figure 3 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Figure 4 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Viaarxiv icon

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

Nov 06, 2020
Ron J. Weiss, RJ Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, Diederik P. Kingma

Figure 1 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 2 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 3 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 4 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Viaarxiv icon

Non-saturating GAN training as divergence minimization

Oct 15, 2020
Matt Shannon, Ben Poole, Soroosh Mariooryad, Tom Bagby, Eric Battenberg, David Kao, Daisy Stanton, RJ Skerry-Ryan

Figure 1 for Non-saturating GAN training as divergence minimization
Figure 2 for Non-saturating GAN training as divergence minimization
Figure 3 for Non-saturating GAN training as divergence minimization
Figure 4 for Non-saturating GAN training as divergence minimization
Viaarxiv icon

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Oct 23, 2019
Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby

Figure 1 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Figure 2 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Figure 3 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Figure 4 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Viaarxiv icon

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

Oct 03, 2019
Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

Figure 1 for Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Figure 2 for Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Figure 3 for Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Figure 4 for Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Viaarxiv icon

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

Jul 24, 2019
Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran

Figure 1 for Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Figure 2 for Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Figure 3 for Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Figure 4 for Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Viaarxiv icon