Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Annamaria Mesaros

Domain-Agnostic Incremental Learning for Sound Classification. A DCASE 2026 Challenge task

Jun 01, 2026

Riccardo Casciotti, Manjunath Mulimani, Manu Harju, Jesper Rindom Jensen, Annamaria Mesaros

Abstract:This paper presents the Domain-Agnostic Incremental Learning for Audio Classification Task of the DCASE 2026 Challenge. Incremental learning refers to sequentially learning new tasks with the same system while maintaining its knowledge and performance on the previously learned task. Domain-incremental learning for sound classification refers to learning the same sound classes but in different acoustic domains, and was formalized as a data challenge for the first time in DCASE 2026. Participants will train a system to learn ten sound classes in three different domains, with learning at each incremental task not having access to previous task data. Submitted systems will be ranked by the overall average accuracy calculated over the three domains. The provided baseline system obtains a modest performance of 44.9\% accuracy over the three domains, mostly due to erroneous inference of the domain for the test sample.

* White paper. To be completed after the challenge deadline and submitted for the DCASE 2026 Workshop

Via

Access Paper or Ask Questions

Incremental learning for audio classification with Hebbian Deep Neural Networks

Apr 20, 2026

Riccardo Casciotti, Francesco De Santis, Alberto Antonietti, Annamaria Mesaros

Abstract:The ability of humans for lifelong learning is an inspiration for deep learning methods and in particular for continual learning. In this work, we apply Hebbian learning, a biologically inspired learning process, to sound classification. We propose a kernel plasticity approach that selectively modulates network kernels during incremental learning, acting on selected kernels to learn new information and on others to retain previous knowledge. Using the ESC-50 dataset, the proposed method achieves 76.3% overall accuracy over five incremental steps, outperforming a baseline without kernel plasticity (68.7%) and demonstrating significantly greater stability across tasks.

* ICASSP 2026

Via

Access Paper or Ask Questions

Online incremental learning for audio classification using a pretrained audio model

Aug 28, 2025

Manjunath Mulimani, Annamaria Mesaros

Figure 1 for Online incremental learning for audio classification using a pretrained audio model

Figure 2 for Online incremental learning for audio classification using a pretrained audio model

Figure 3 for Online incremental learning for audio classification using a pretrained audio model

Figure 4 for Online incremental learning for audio classification using a pretrained audio model

Abstract:Incremental learning aims to learn new tasks sequentially without forgetting the previously learned ones. Most of the existing incremental learning methods for audio focus on training the model from scratch on the initial task, and the same model is used to learn upcoming incremental tasks. The model is trained for several iterations to adapt to each new task, using some specific approaches to reduce the forgetting of old tasks. In this work, we propose a method for using generalizable audio embeddings produced by a pre-trained model to develop an online incremental learner that solves sequential audio classification tasks over time. Specifically, we inject a layer with a nonlinear activation function between the pre-trained model's audio embeddings and the classifier; this layer expands the dimensionality of the embeddings and effectively captures the distinct characteristics of sound classes. Our method adapts the model in a single forward pass (online) through the training samples of any task, with minimal forgetting of old tasks. We demonstrate the performance of the proposed method in two incremental learning setups: one class-incremental learning using ESC-50 and one domain-incremental learning of different cities from the TAU Urban Acoustic Scenes 2019 dataset; for both cases, the proposed approach outperforms other methods.

* Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025

Via

Access Paper or Ask Questions

Sound event detection with audio-text models and heterogeneous temporal annotations

Aug 28, 2025

Manu Harju, Annamaria Mesaros

Abstract:Recent advances in generating synthetic captions based on audio and related metadata allow using the information contained in natural language as input for other audio tasks. In this paper, we propose a novel method to guide a sound event detection system with free-form text. We use machine-generated captions as complementary information to the strong labels for training, and evaluate the systems using different types of textual inputs. In addition, we study a scenario where only part of the training data has strong labels, and the rest of it only has temporally weak labels. Our findings show that synthetic captions improve the performance in both cases compared to the CRNN architecture typically used for sound event detection. On a dataset of 50 highly unbalanced classes, the PSDS-1 score increases from 0.223 to 0.277 when trained with strong labels, and from 0.166 to 0.218 when half of the training data has only weak labels.

* Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2025

Via

Access Paper or Ask Questions

Domain-Incremental Learning for Audio Classification

Dec 23, 2024

Manjunath Mulimani, Annamaria Mesaros

Abstract:In this work, we propose a method for domain-incremental learning for audio classification from a sequence of datasets recorded in different acoustic conditions. Fine-tuning a model on a sequence of evolving domains or datasets leads to forgetting of previously learned knowledge. On the other hand, freezing all the layers of the model leads to the model not adapting to the new domain. In this work, our novel dynamic network architecture keeps the shared homogeneous acoustic characteristics of domains, and learns the domain-specific acoustic characteristics in incremental steps. Our approach achieves a good balance between retaining the knowledge of previously learned domains and acquiring the knowledge of the new domain. We demonstrate the effectiveness of the proposed method on incremental learning of single-label classification of acoustic scenes from European cities and Korea, and multi-label classification of audio recordings from Audioset and FSD50K datasets. The proposed approach learns to classify acoustic scenes incrementally with an average accuracy of 71.9% for the order: European cities -> Korea, and 83.4% for Korea -> European cities. In a multi-label audio classification setup, it achieves an average lwlrap of 47.5% for Audioset -> FSD50K and 40.7% for FSD50K -> Audioset.

* Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

Class-Incremental Learning for Sound Event Localization and Detection

Nov 19, 2024

Ruchi Pandey, Manjunath Mulimani, Archontis Politis, Annamaria Mesaros

Abstract:This paper investigates the feasibility of class-incremental learning (CIL) for Sound Event Localization and Detection (SELD) tasks. The method features an incremental learner that can learn new sound classes independently while preserving knowledge of old classes. The continual learning is achieved through a mean square error-based distillation loss to minimize output discrepancies between subsequent learners. The experiments are conducted on the TAU-NIGENS Spatial Sound Events 2021 dataset, which includes 12 different sound classes and demonstrate the efficacy of proposed method. We begin by learning 8 classes and introduce the 4 new classes at next stage. After the incremental phase, the system is evaluated on the full set of learned classes. Results show that, for this realistic dataset, our proposed method successfully maintains baseline performance across all metrics.

Via

Access Paper or Ask Questions

A decade of DCASE: Achievements, practices, evaluations and future challenges

Oct 07, 2024

Annamaria Mesaros, Romain Serizel, Toni Heittola, Tuomas Virtanen, Mark D. Plumbley

Abstract:This paper introduces briefly the history and growth of the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge, workshop, research area and research community. Created in 2013 as a data evaluation challenge, DCASE has become a major research topic in the Audio and Acoustic Signal Processing area. Its success comes from a combination of factors: the challenge offers a large variety of tasks that are renewed each year; and the workshop offers a channel for dissemination of related work, engaging a young and dynamic community. At the same time, DCASE faces its own challenges, growing and expanding to different areas. One of the core principles of DCASE is open science and reproducibility: publicly available datasets, baseline systems, technical reports and workshop publications. While the DCASE challenge and workshop are independent of IEEE SPS, the challenge receives annual endorsement from the AASP TC, and the DCASE community contributes significantly to the ICASSP flagship conference and the success of SPS in many of its activities.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Computer Audition: From Task-Specific Machine Learning to Foundation Models

Jul 22, 2024

Andreas Triantafyllopoulos, Iosif Tsangko, Alexander Gebhard, Annamaria Mesaros, Tuomas Virtanen, Björn Schuller

Abstract:Foundation models (FMs) are increasingly spearheading recent advances on a variety of tasks that fall under the purview of computer audition -- the use of machines to understand sounds. They feature several advantages over traditional pipelines: among others, the ability to consolidate multiple tasks in a single model, the option to leverage knowledge from other modalities, and the readily-available interaction with human users. Naturally, these promises have created substantial excitement in the audio community, and have led to a wave of early attempts to build new, general-purpose foundation models for audio. In the present contribution, we give an overview of computational audio analysis as it transitions from traditional pipelines towards auditory foundation models. Our work highlights the key operating principles that underpin those models, and showcases how they can accommodate multiple tasks that the audio community previously tackled separately.

Via

Access Paper or Ask Questions

Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations

Jun 19, 2024

Manjunath Mulimani, Annamaria Mesaros

Figure 1 for Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations

Figure 2 for Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations

Figure 3 for Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations

Figure 4 for Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations

Abstract:In this paper, we propose a method for online domain-incremental learning of acoustic scene classification from a sequence of different locations. Simply training a deep learning model on a sequence of different locations leads to forgetting of previously learned knowledge. In this work, we only correct the statistics of the Batch Normalization layers of a model using a few samples to learn the acoustic scenes from a new location without any excessive training. Experiments are performed on acoustic scenes from 11 different locations, with an initial task containing acoustic scenes from 6 locations and the remaining 5 incremental tasks each representing the acoustic scenes from a different location. The proposed approach outperforms fine-tuning based methods and achieves an average accuracy of 48.8% after learning the last task in sequence without forgetting acoustic scenes from the previously learned locations.

* Accepted to EUSIPCO 2024

Via

Access Paper or Ask Questions

DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

Jun 12, 2024

Samuele Cornell, Janek Ebbers, Constance Douwes, Irene Martín-Morató, Manu Harju, Annamaria Mesaros, Romain Serizel

Figure 1 for DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

Figure 2 for DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

Figure 3 for DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

Abstract:The Detection and Classification of Acoustic Scenes and Events Challenge Task 4 aims to advance sound event detection (SED) systems in domestic environments by leveraging training data with different supervision uncertainty. Participants are challenged in exploring how to best use training data from different domains and with varying annotation granularity (strong/weak temporal resolution, soft/hard labels), to obtain a robust SED system that can generalize across different scenarios. Crucially, annotation across available training datasets can be inconsistent and hence sound labels of one dataset may be present but not annotated in the other one and vice-versa. As such, systems will have to cope with potentially missing target labels during training. Moreover, as an additional novelty, systems will also be evaluated on labels with different granularity in order to assess their robustness for different applications. To lower the entry barrier for participants, we developed an updated baseline system with several caveats to address these aforementioned problems. Results with our baseline system indicate that this research direction is promising and is possible to obtain a stronger SED system by using diverse domain training data with missing labels compared to training a SED system for each domain separately.

Via

Access Paper or Ask Questions