Picture for Dan Su

Dan Su

Celine

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

Add code
Jun 22, 2021
Figure 1 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 2 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 3 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Figure 4 for Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Viaarxiv icon

Controllable Context-aware Conversational Speech Synthesis

Add code
Jun 21, 2021
Figure 1 for Controllable Context-aware Conversational Speech Synthesis
Figure 2 for Controllable Context-aware Conversational Speech Synthesis
Figure 3 for Controllable Context-aware Conversational Speech Synthesis
Figure 4 for Controllable Context-aware Conversational Speech Synthesis
Viaarxiv icon

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Add code
Jun 13, 2021
Figure 1 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 2 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 3 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Figure 4 for GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Viaarxiv icon

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis

Add code
Jun 11, 2021
Figure 1 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Figure 2 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Figure 3 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Figure 4 for Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis
Viaarxiv icon

Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition

Add code
Jun 08, 2021
Figure 1 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Figure 2 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Figure 3 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Figure 4 for Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition
Viaarxiv icon

Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance

Add code
May 31, 2021
Figure 1 for Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance
Figure 2 for Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance
Figure 3 for Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance
Figure 4 for Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance
Viaarxiv icon

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Add code
May 28, 2021
Figure 1 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Figure 2 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Figure 3 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Viaarxiv icon

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

Add code
May 13, 2021
Figure 1 for Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Figure 2 for Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Figure 3 for Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Figure 4 for Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Viaarxiv icon

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Add code
May 08, 2021
Figure 1 for Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Figure 2 for Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Figure 3 for Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Figure 4 for Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Viaarxiv icon

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

Add code
May 07, 2021
Figure 1 for SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Figure 2 for SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Figure 3 for SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Figure 4 for SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Viaarxiv icon