Picture for Yongyi Zang

Yongyi Zang

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching

Add code
Oct 25, 2025
Viaarxiv icon

Music Source Restoration

Add code
May 27, 2025
Viaarxiv icon

Training-Free Multi-Step Audio Source Separation

Add code
May 26, 2025
Viaarxiv icon

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech

Add code
Feb 13, 2025
Figure 1 for ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
Figure 2 for ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
Figure 3 for ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
Figure 4 for ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech
Viaarxiv icon

Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders

Add code
Jan 07, 2025
Figure 1 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Figure 2 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Figure 3 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Figure 4 for Piano Transcription by Hierarchical Language Modeling with Pretrained Roll-based Encoders
Viaarxiv icon

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

Add code
Aug 28, 2024
Viaarxiv icon

The Interpretation Gap in Text-to-Music Generation Models

Add code
Jul 14, 2024
Figure 1 for The Interpretation Gap in Text-to-Music Generation Models
Figure 2 for The Interpretation Gap in Text-to-Music Generation Models
Figure 3 for The Interpretation Gap in Text-to-Music Generation Models
Figure 4 for The Interpretation Gap in Text-to-Music Generation Models
Viaarxiv icon

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Add code
Jun 04, 2024
Figure 1 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 2 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 3 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Figure 4 for CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Viaarxiv icon

Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

Add code
May 22, 2024
Figure 1 for Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
Figure 2 for Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
Figure 3 for Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
Figure 4 for Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
Viaarxiv icon