Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Igor Pereira

Diff-VS: Efficient Audio-Aware Diffusion U-Net for Vocals Separation

Apr 01, 2026

Yun-Ning, Hung, Richard Vogl, Filip Korzeniowski, Igor Pereira

Abstract:While diffusion models are best known for their performance in generative tasks, they have also been successfully applied to many other tasks, including audio source separation. However, current generative approaches to music source separation often underperform on standard objective metrics. In this paper, we address this issue by introducing a novel generative vocal separation model based on the Elucidated Diffusion Model (EDM) framework. Our model processes complex short-time Fourier transform spectrograms and employs an improved U-Net architecture based on music-informed design choices. Our approach matches discriminative baselines on objective metrics and achieves perceptual quality comparable to state-of-the-art systems, as assessed by proxy subjective metrics. We hope these results encourage broader exploration of generative methods for music source separation

* Accepted at ICASSP 2026

Via

Access Paper or Ask Questions

Moisesdb: A dataset for source separation beyond 4-stems

Jul 29, 2023

Igor Pereira, Felipe Araújo, Filip Korzeniowski, Richard Vogl

Figure 1 for Moisesdb: A dataset for source separation beyond 4-stems

Figure 2 for Moisesdb: A dataset for source separation beyond 4-stems

Figure 3 for Moisesdb: A dataset for source separation beyond 4-stems

Figure 4 for Moisesdb: A dataset for source separation beyond 4-stems

Abstract:In this paper, we introduce the MoisesDB dataset for musical source separation. It consists of 240 tracks from 45 artists, covering twelve musical genres. For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems. This will facilitate building and evaluating fine-grained source separation systems that go beyond the limitation of using four stems (drums, bass, other, and vocals) due to lack of data. To facilitate the adoption of this dataset, we publish an easy-to-use Python library to download, process and use MoisesDB. Alongside a thorough documentation and analysis of the dataset contents, this work provides baseline results for open-source separation models for varying separation granularities (four, five, and six stems), and discuss their results.

Via

Access Paper or Ask Questions