Picture for Yuepeng Jiang

Yuepeng Jiang

REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers

Add code
Aug 07, 2025
Viaarxiv icon

SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

Add code
May 16, 2025
Viaarxiv icon

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Add code
Mar 03, 2025
Viaarxiv icon

Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

Add code
Aug 28, 2024
Figure 1 for Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
Figure 2 for Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
Figure 3 for Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
Figure 4 for Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
Viaarxiv icon

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

Add code
Jun 11, 2024
Figure 1 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 2 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 3 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 4 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Viaarxiv icon

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

Add code
Jun 11, 2024
Figure 1 for WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
Figure 2 for WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
Figure 3 for WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
Figure 4 for WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
Viaarxiv icon

VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

Add code
Oct 04, 2023
Viaarxiv icon

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

Add code
Sep 27, 2023
Figure 1 for DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
Figure 2 for DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
Figure 3 for DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
Figure 4 for DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
Viaarxiv icon

HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

Add code
Sep 25, 2023
Viaarxiv icon

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

Add code
May 21, 2023
Figure 1 for DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Figure 2 for DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Figure 3 for DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Figure 4 for DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Viaarxiv icon