Picture for Sheng Zhao

Sheng Zhao

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Add code
Aug 11, 2023
Figure 1 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 2 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 3 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Figure 4 for VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Viaarxiv icon

The detection and rectification for identity-switch based on unfalsified control

Add code
Jul 27, 2023
Figure 1 for The detection and rectification for identity-switch based on unfalsified control
Figure 2 for The detection and rectification for identity-switch based on unfalsified control
Figure 3 for The detection and rectification for identity-switch based on unfalsified control
Figure 4 for The detection and rectification for identity-switch based on unfalsified control
Viaarxiv icon

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

Add code
Jul 03, 2023
Figure 1 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Figure 2 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Figure 3 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Figure 4 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Viaarxiv icon

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

Add code
Jul 03, 2023
Viaarxiv icon

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Add code
May 04, 2023
Figure 1 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Figure 2 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Figure 3 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Figure 4 for NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Viaarxiv icon

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Add code
Apr 23, 2023
Figure 1 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 2 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 3 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Figure 4 for DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
Viaarxiv icon

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

Add code
Apr 05, 2023
Viaarxiv icon

HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details

Add code
Mar 20, 2023
Viaarxiv icon

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

Add code
Mar 08, 2023
Viaarxiv icon

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

Add code
Mar 07, 2023
Viaarxiv icon