Picture for Dongchao Yang

Dongchao Yang

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

Add code
Jun 12, 2024
Figure 1 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 2 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 3 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Figure 4 for CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Viaarxiv icon

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Add code
Jun 05, 2024
Figure 1 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 2 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 3 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 4 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Viaarxiv icon

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

Add code
Jun 04, 2024
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Add code
Mar 05, 2024
Figure 1 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Figure 2 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Figure 3 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Figure 4 for NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Viaarxiv icon

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation

Add code
Dec 24, 2023
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Add code
Oct 11, 2023
Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

Add code
Oct 10, 2023
Viaarxiv icon

PromptTTS 2: Describing and Generating Voices with Text Prompt

Add code
Sep 05, 2023
Figure 1 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 2 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 3 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 4 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Viaarxiv icon