Picture for Sheng Zhao

Sheng Zhao

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Figure 1 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 2 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 3 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 4 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Figure 1 for RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Figure 2 for RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Figure 3 for RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Figure 4 for RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Viaarxiv icon

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Add code
Mar 05, 2024
Viaarxiv icon

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Add code
Feb 12, 2024
Viaarxiv icon

GAIA: Zero-shot Talking Avatar Generation

Add code
Nov 26, 2023
Figure 1 for GAIA: Zero-shot Talking Avatar Generation
Figure 2 for GAIA: Zero-shot Talking Avatar Generation
Figure 3 for GAIA: Zero-shot Talking Avatar Generation
Figure 4 for GAIA: Zero-shot Talking Avatar Generation
Viaarxiv icon

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Add code
Oct 11, 2023
Figure 1 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 2 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 3 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Figure 4 for UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Viaarxiv icon

MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023

Add code
Sep 12, 2023
Viaarxiv icon

Large-Scale Automatic Audiobook Creation

Add code
Sep 07, 2023
Figure 1 for Large-Scale Automatic Audiobook Creation
Viaarxiv icon

PromptTTS 2: Describing and Generating Voices with Text Prompt

Add code
Sep 05, 2023
Figure 1 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 2 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 3 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 4 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Viaarxiv icon