Voice Conversion


Voice conversion is the process of converting the voice of one speaker into the voice of another speaker.

RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion

Add code
Sep 10, 2024
Figure 1 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Figure 2 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Figure 3 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Figure 4 for RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Viaarxiv icon

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

DiffEVC: Any-to-Any Emotion Voice Conversion with Expressive Guidance

Add code
Sep 05, 2024
Viaarxiv icon

Hear Your Face: Face-based voice conversion with F0 estimation

Add code
Aug 19, 2024
Figure 1 for Hear Your Face: Face-based voice conversion with F0 estimation
Figure 2 for Hear Your Face: Face-based voice conversion with F0 estimation
Figure 3 for Hear Your Face: Face-based voice conversion with F0 estimation
Figure 4 for Hear Your Face: Face-based voice conversion with F0 estimation
Viaarxiv icon

Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech

Add code
Aug 21, 2024
Viaarxiv icon

EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models

Add code
Sep 06, 2024
Viaarxiv icon

Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training

Add code
Sep 03, 2024
Figure 1 for Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Figure 2 for Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Figure 3 for Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Figure 4 for Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Viaarxiv icon

Improving curriculum learning for target speaker extraction with synthetic speakers

Add code
Oct 01, 2024
Figure 1 for Improving curriculum learning for target speaker extraction with synthetic speakers
Figure 2 for Improving curriculum learning for target speaker extraction with synthetic speakers
Figure 3 for Improving curriculum learning for target speaker extraction with synthetic speakers
Figure 4 for Improving curriculum learning for target speaker extraction with synthetic speakers
Viaarxiv icon

Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion

Add code
Sep 01, 2024
Viaarxiv icon

RAVE for Speech: Efficient Voice Conversion at High Sampling Rates

Add code
Aug 29, 2024
Figure 1 for RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
Figure 2 for RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
Figure 3 for RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
Figure 4 for RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
Viaarxiv icon