Picture for Xinfa Zhu

Xinfa Zhu

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Add code
Jun 14, 2024
Viaarxiv icon

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Add code
Jun 12, 2024
Viaarxiv icon

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Add code
Jun 11, 2024
Viaarxiv icon

Accent-VITS:accent transfer for end-to-end TTS

Add code
Dec 29, 2023
Viaarxiv icon

SELM: Speech Enhancement Using Discrete Tokens and Language Models

Add code
Dec 15, 2023
Viaarxiv icon

SponTTS: modeling and transferring spontaneous style for TTS

Add code
Nov 13, 2023
Figure 1 for SponTTS: modeling and transferring spontaneous style for TTS
Figure 2 for SponTTS: modeling and transferring spontaneous style for TTS
Figure 3 for SponTTS: modeling and transferring spontaneous style for TTS
Figure 4 for SponTTS: modeling and transferring spontaneous style for TTS
Viaarxiv icon

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning

Add code
Oct 26, 2023
Figure 1 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Figure 2 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Figure 3 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Figure 4 for Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
Viaarxiv icon

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

Add code
Oct 12, 2023
Viaarxiv icon

Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis

Add code
Oct 06, 2023
Figure 1 for Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Figure 2 for Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Figure 3 for Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Figure 4 for Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Viaarxiv icon

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

Add code
Oct 06, 2023
Figure 1 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 2 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 3 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Figure 4 for U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
Viaarxiv icon