Voice Conversion


Voice conversion is the process of converting the voice of one speaker into the voice of another speaker.

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction

Add code
Dec 11, 2024
Figure 1 for A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
Figure 2 for A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
Figure 3 for A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
Figure 4 for A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
Viaarxiv icon

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Add code
Dec 10, 2024
Figure 1 for StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Figure 2 for StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Figure 3 for StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Figure 4 for StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Viaarxiv icon

SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations

Add code
Nov 25, 2024
Figure 1 for SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Figure 2 for SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Figure 3 for SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Figure 4 for SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Viaarxiv icon

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

Add code
Nov 29, 2024
Figure 1 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 2 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 3 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 4 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Viaarxiv icon

Region-Based Optimization in Continual Learning for Audio Deepfake Detection

Add code
Dec 16, 2024
Figure 1 for Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Figure 2 for Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Figure 3 for Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Figure 4 for Region-Based Optimization in Continual Learning for Audio Deepfake Detection
Viaarxiv icon

Building low-resource African language corpora: A case study of Kidawida, Kalenjin and Dholuo

Add code
Jan 19, 2025
Viaarxiv icon

Zero-shot Voice Conversion with Diffusion Transformers

Add code
Nov 15, 2024
Viaarxiv icon

The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024

Add code
Dec 02, 2024
Figure 1 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Figure 2 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Figure 3 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Figure 4 for The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Viaarxiv icon

Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment

Add code
Jan 30, 2025
Viaarxiv icon

GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

Add code
Dec 03, 2024
Viaarxiv icon