Picture for Zhiyao Duan

Zhiyao Duan

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Add code
Jun 15, 2024
Figure 1 for GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Figure 2 for GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Viaarxiv icon

Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Add code
Jun 15, 2024
Figure 1 for Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Figure 2 for Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Viaarxiv icon

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Add code
Jun 04, 2024
Figure 1 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 2 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 3 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 4 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Viaarxiv icon

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

Add code
May 08, 2024
Figure 1 for SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
Figure 2 for SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
Figure 3 for SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
Figure 4 for SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan
Viaarxiv icon

Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

Add code
Apr 17, 2024
Figure 1 for Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
Figure 2 for Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
Figure 3 for Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
Figure 4 for Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
Viaarxiv icon

MusicHiFi: Fast High-Fidelity Stereo Vocoding

Add code
Mar 20, 2024
Figure 1 for MusicHiFi: Fast High-Fidelity Stereo Vocoding
Figure 2 for MusicHiFi: Fast High-Fidelity Stereo Vocoding
Figure 3 for MusicHiFi: Fast High-Fidelity Stereo Vocoding
Figure 4 for MusicHiFi: Fast High-Fidelity Stereo Vocoding
Viaarxiv icon

Toward Fully Self-Supervised Multi-Pitch Estimation

Add code
Feb 23, 2024
Figure 1 for Toward Fully Self-Supervised Multi-Pitch Estimation
Figure 2 for Toward Fully Self-Supervised Multi-Pitch Estimation
Figure 3 for Toward Fully Self-Supervised Multi-Pitch Estimation
Figure 4 for Toward Fully Self-Supervised Multi-Pitch Estimation
Viaarxiv icon

Cacophony: An Improved Contrastive Audio-Text Model

Add code
Feb 10, 2024
Figure 1 for Cacophony: An Improved Contrastive Audio-Text Model
Figure 2 for Cacophony: An Improved Contrastive Audio-Text Model
Figure 3 for Cacophony: An Improved Contrastive Audio-Text Model
Figure 4 for Cacophony: An Improved Contrastive Audio-Text Model
Viaarxiv icon

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

Add code
Nov 24, 2023
Figure 1 for Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
Figure 2 for Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
Figure 3 for Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
Figure 4 for Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech
Viaarxiv icon

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

Add code
Nov 18, 2023
Figure 1 for EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Figure 2 for EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Figure 3 for EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Figure 4 for EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Viaarxiv icon