Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Longfei Song

Exploring the Potential of SSL Models for Sound Event Detection

May 17, 2025

Hanfang Cui, Longfei Song, Li Li, Dongxing Xu, Yanhua Long

Abstract:Self-supervised learning (SSL) models offer powerful representations for sound event detection (SED), yet their synergistic potential remains underexplored. This study systematically evaluates state-of-the-art SSL models to guide optimal model selection and integration for SED. We propose a framework that combines heterogeneous SSL representations (e.g., BEATs, HuBERT, WavLM) through three fusion strategies: individual SSL embedding integration, dual-modal fusion, and full aggregation. Experiments on the DCASE 2023 Task 4 Challenge reveal that dual-modal fusion (e.g., CRNN+BEATs+WavLM) achieves complementary performance gains, while CRNN+BEATs alone delivers the best results among individual SSL models. We further introduce normalized sound event bounding boxes (nSEBBs), an adaptive post-processing method that dynamically adjusts event boundary predictions, improving PSDS1 by up to 4% for standalone SSL models. These findings highlight the compatibility and complementarity of SSL architectures, providing guidance for task-specific fusion and robust SED system design.

* 27 pages, 5 figures, submitted to the Journal of King Saud University - Computer and Information Sciences (under review)

Via

Access Paper or Ask Questions

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Aug 20, 2024

Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long

Figure 1 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Figure 2 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Figure 3 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Figure 4 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Abstract:The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions