Picture for Sanyuan Chen

Sanyuan Chen

Autoregressive Speech Synthesis without Vector Quantization

Add code
Jul 11, 2024
Viaarxiv icon

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

Add code
Jun 12, 2024
Viaarxiv icon

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Add code
Jun 08, 2024
Viaarxiv icon

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Add code
Mar 31, 2024
Figure 1 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 2 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 3 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 4 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Viaarxiv icon

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Add code
Aug 14, 2023
Figure 1 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 2 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 3 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Figure 4 for SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Viaarxiv icon

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

Add code
Mar 07, 2023
Figure 1 for Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Figure 2 for Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Figure 3 for Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Figure 4 for Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Viaarxiv icon

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Add code
Jan 05, 2023
Figure 1 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Figure 2 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Figure 3 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Figure 4 for Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Viaarxiv icon

BEATs: Audio Pre-Training with Acoustic Tokenizers

Add code
Dec 18, 2022
Figure 1 for BEATs: Audio Pre-Training with Acoustic Tokenizers
Figure 2 for BEATs: Audio Pre-Training with Acoustic Tokenizers
Figure 3 for BEATs: Audio Pre-Training with Acoustic Tokenizers
Figure 4 for BEATs: Audio Pre-Training with Acoustic Tokenizers
Viaarxiv icon

TESSP: Text-Enhanced Self-Supervised Speech Pre-training

Add code
Nov 24, 2022
Figure 1 for TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Figure 2 for TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Figure 3 for TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Figure 4 for TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Viaarxiv icon

Exploring WavLM on Speech Enhancement

Add code
Nov 18, 2022
Figure 1 for Exploring WavLM on Speech Enhancement
Figure 2 for Exploring WavLM on Speech Enhancement
Figure 3 for Exploring WavLM on Speech Enhancement
Viaarxiv icon