Picture for Zhiyao Duan

Zhiyao Duan

Towards Perception-Informed Latent HRTF Representations

Add code
Jul 03, 2025
Viaarxiv icon

A Review on Score-based Generative Models for Audio Applications

Add code
Jun 10, 2025
Viaarxiv icon

HARP 2.0: Expanding Hosted, Asynchronous, Remote Processing for Deep Learning in the DAW

Add code
Mar 04, 2025
Figure 1 for HARP 2.0: Expanding Hosted, Asynchronous, Remote Processing for Deep Learning in the DAW
Figure 2 for HARP 2.0: Expanding Hosted, Asynchronous, Remote Processing for Deep Learning in the DAW
Viaarxiv icon

Audio Visual Segmentation Through Text Embeddings

Add code
Feb 22, 2025
Viaarxiv icon

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

Add code
Aug 28, 2024
Viaarxiv icon

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

Add code
Aug 17, 2024
Figure 1 for Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Figure 2 for Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Figure 3 for Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Viaarxiv icon

A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

Add code
Jun 20, 2024
Figure 1 for A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
Figure 2 for A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
Figure 3 for A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
Figure 4 for A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
Viaarxiv icon

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Add code
Jun 15, 2024
Figure 1 for GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Figure 2 for GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Viaarxiv icon

Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Add code
Jun 15, 2024
Figure 1 for Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Figure 2 for Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Viaarxiv icon

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Add code
Jun 04, 2024
Figure 1 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 2 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 3 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 4 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Viaarxiv icon