Picture for Qiao Tian

Qiao Tian

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

Add code
Oct 06, 2023
Figure 1 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 2 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 3 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 4 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Viaarxiv icon

AudioSR: Versatile Audio Super-resolution at Scale

Add code
Sep 13, 2023
Figure 1 for AudioSR: Versatile Audio Super-resolution at Scale
Figure 2 for AudioSR: Versatile Audio Super-resolution at Scale
Figure 3 for AudioSR: Versatile Audio Super-resolution at Scale
Figure 4 for AudioSR: Versatile Audio Super-resolution at Scale
Viaarxiv icon

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Add code
Sep 03, 2023
Figure 1 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 2 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 3 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Figure 4 for MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Viaarxiv icon

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

Add code
Sep 02, 2023
Figure 1 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 2 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 3 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 4 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Viaarxiv icon

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Add code
Aug 10, 2023
Figure 1 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Figure 2 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Figure 3 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Figure 4 for AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Viaarxiv icon

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

Add code
Jun 18, 2023
Figure 1 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 2 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 3 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Figure 4 for LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Viaarxiv icon

PolyVoice: Language Models for Speech to Speech Translation

Add code
Jun 13, 2023
Figure 1 for PolyVoice: Language Models for Speech to Speech Translation
Figure 2 for PolyVoice: Language Models for Speech to Speech Translation
Figure 3 for PolyVoice: Language Models for Speech to Speech Translation
Figure 4 for PolyVoice: Language Models for Speech to Speech Translation
Viaarxiv icon

Efficient Neural Music Generation

Add code
May 25, 2023
Figure 1 for Efficient Neural Music Generation
Figure 2 for Efficient Neural Music Generation
Figure 3 for Efficient Neural Music Generation
Figure 4 for Efficient Neural Music Generation
Viaarxiv icon

Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion

Add code
May 12, 2023
Figure 1 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 2 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 3 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 4 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Viaarxiv icon

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

Add code
May 09, 2023
Figure 1 for Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Figure 2 for Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Figure 3 for Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Figure 4 for Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Viaarxiv icon