Picture for Yuanzhe Chen

Yuanzhe Chen

Improving Audio Generation with Visual Enhanced Caption

Add code
Jul 05, 2024
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Viaarxiv icon

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Add code
Apr 27, 2024
Figure 1 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Figure 2 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Figure 3 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Figure 4 for T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
Viaarxiv icon

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Add code
Feb 07, 2024
Figure 1 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Figure 2 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Figure 3 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Figure 4 for StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Viaarxiv icon

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Add code
Jun 18, 2023
Figure 1 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 2 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 3 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 4 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Viaarxiv icon

Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion

Add code
May 12, 2023
Figure 1 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 2 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 3 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 4 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Viaarxiv icon

Non-parallel Accent Conversion using Pseudo Siamese Disentanglement Network

Add code
Dec 12, 2022
Figure 1 for Non-parallel Accent Conversion using Pseudo Siamese Disentanglement Network
Figure 2 for Non-parallel Accent Conversion using Pseudo Siamese Disentanglement Network
Figure 3 for Non-parallel Accent Conversion using Pseudo Siamese Disentanglement Network
Figure 4 for Non-parallel Accent Conversion using Pseudo Siamese Disentanglement Network
Viaarxiv icon

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

Add code
Nov 16, 2022
Figure 1 for Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Figure 2 for Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Figure 3 for Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Figure 4 for Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Viaarxiv icon

Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

Add code
Oct 27, 2022
Figure 1 for Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Figure 2 for Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Figure 3 for Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Figure 4 for Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance
Viaarxiv icon

Cloning one's voice using very limited data in the wild

Add code
Oct 08, 2021
Figure 1 for Cloning one's voice using very limited data in the wild
Figure 2 for Cloning one's voice using very limited data in the wild
Figure 3 for Cloning one's voice using very limited data in the wild
Figure 4 for Cloning one's voice using very limited data in the wild
Viaarxiv icon