Topic:Unsupervised Anomaly Detection In Sound
What is Unsupervised Anomaly Detection In Sound? Unsupervised anomaly detection in sound is the process of identifying unusual or abnormal sounds without using labeled examples.
Papers and Code
Jan 27, 2025
Abstract:This paper presents a novel approach for hazard analysis in dashcam footage, addressing the detection of driver reactions to hazards, the identification of hazardous objects, and the generation of descriptive captions. We first introduce a method for detecting driver reactions through speed and sound anomaly detection, leveraging unsupervised learning techniques. For hazard detection, we employ a set of heuristic rules as weak classifiers, which are combined using an ensemble method. This ensemble approach is further refined with differential privacy to mitigate overconfidence, ensuring robustness despite the lack of labeled data. Lastly, we use state-of-the-art vision-language models for hazard captioning, generating descriptive labels for the detected hazards. Our method achieved the highest scores in the Challenge on Out-of-Label in Autonomous Driving, demonstrating its effectiveness across all three tasks. Source codes are publicly available at https://github.com/ffyyytt/COOOL_2025.
* 5 pages, WACV 2025
Via

Oct 11, 2024
Abstract:In this work, a novel deep neural network, designed to enhance the efficiency and effectiveness of unsupervised sound anomaly detection, is presented. The proposed model exploits an attention module and separable convolutions to identify salient time-frequency patterns in audio data to discriminate between normal and anomalous sounds with reduced computational complexity. The approach is validated through extensive experiments using the Task 2 dataset of the DCASE 2020 challenge. Results demonstrate superior performance in terms of anomaly detection accuracy while having fewer parameters than state-of-the-art methods. Implementation details, code, and pre-trained models are available in https://github.com/michaelneri/unsupervised-audio-anomaly-detection.
* Accepted for publication in IEEE Sensors Letters. 4 pages, 4 figures
Via

Nov 05, 2024
Abstract:The proposed system consists of a two-stage cascade. The first stage performs a rough heartbeat detection while the second stage refines the previous one, improving the temporal localization and also classifying the heartbeats into types S1 and S2. The first contribution is a novel approach that combines the dissimilarity matrix with the frame-level spectral divergence to locate heartbeats using the repetitiveness shown by the heart sounds and the temporal relationships between the intervals defined by the events S1/S2 and non-S1/S2 (systole and diastole). The second contribution is a verification-correction-classification process based on a sliding window that allows the preservation of the temporal structure of the cardiac cycle in order to be applied in the heart sound classification. The proposed method has been assessed using the open access databases PASCAL, CirCor DigiScope Phonocardiogram and an additional sound mixing procedure considering both Additive White Gaussian Noise (AWGN) and different kinds of clinical ambient noises from a commercial database. The proposed method provides the best detection/classification performance in realistic scenarios where the presence of cardiac anomalies as well as different types of clinical environmental noises are active in the PCG signal. Of note, the promising modelling of the temporal structures of the heart provided by the dissimilarity matrix together with the frame-level spectral divergence, as well as the removal of a significant number of spurious heart events and recovery of missing heart events, both corrected by the proposed verification-correction-classification algorithm, suggest that our proposal is a successful tool to be applied in heart segmentation.
* Computer Methods and Programs in Biomedicine, vol 221 (2022)
106909
Via

Sep 24, 2024
Abstract:Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories. In our pipeline, the anomalies in acoustic features are reconstructed from their noisy corrupted features into their approximate normal pattern. Secondly, a post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction. Furthermore, denoising diffusion implicit model is introduced to accelerate the inference speed by a longer sampling interval of the denoising process. The proposed method is innovative in the application of diffusion models as a new scheme. Experimental results on the development set of DCASE 2023 challenge task 2 outperform the baseline by 7.75%, demonstrating the effectiveness of the proposed method.
* This paper will appear at ICPR 2024
Via

Oct 22, 2023
Abstract:First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the target machines. However, due to the lack of anomalous sound data for the target machine types, it becomes challenging when adapting the existing ASD methods to the first-shot task. In this paper, we propose a new framework for the first-shot unsupervised ASD, where metadata-assisted audio generation is used to estimate unknown anomalies, by utilising the available machine information (i.e., metadata and sound data) to fine-tune a text-to-audio generation model for generating the anomalous sounds that contain unique acoustic characteristics accounting for each different machine types. We then use the method of Time-Weighted Frequency domain audio Representation with Gaussian Mixture Model (TWFR-GMM) as the backbone to achieve the first-shot unsupervised ASD. Our proposed FS-TWFR-GMM method achieves competitive performance amongst top systems in DCASE 2023 Challenge Task 2, while requiring only 1% model parameters for detection, as validated in our experiments.
* Submitted to ICASSP 2024
Via

Oct 13, 2023
Abstract:Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-based methods could be limited as the feature learned from normal sounds can also fit with anomalous sounds, reducing the ability of the model in detecting anomalies from sound. The self-supervised methods are not always stable and perform differently, even for machines of the same type. In addition, the anomalous sound may be short-lived, making it even harder to distinguish from normal sound. This paper proposes an ID constrained Transformer-based autoencoder (IDC-TransAE) architecture with weighted anomaly score computation for unsupervised ASD. Machine ID is employed to constrain the latent space of the Transformer-based autoencoder (TransAE) by introducing a simple ID classifier to learn the difference in the distribution for the same machine type and enhance the ability of the model in distinguishing anomalous sound. Moreover, weighted anomaly score computation is introduced to highlight the anomaly scores of anomalous events that only appear for a short time. Experiments performed on DCASE 2020 Challenge Task2 development dataset demonstrate the effectiveness and superiority of our proposed method.
* Accepted by EURASIP Journal on Audio, Speech, and Music Processing
Via

Oct 24, 2023
Abstract:Dependence on raw materials, especially in the mining sector, is a key part of today's economy. Aggregates are vital, being the second most used raw material after water. Digitally transforming this sector is key to optimizing operations. However, supervision and maintenance (predictive and corrective) are challenges little explored in this sector, due to the particularities of the sector, machinery and environmental conditions. All this, despite the successes achieved in other scenarios in monitoring with acoustic and contact sensors. We present an unsupervised learning scheme that trains a variational autoencoder model on a set of sound records. This is the first such dataset collected during processing plant operations, containing information from different points of the processing line. Our results demonstrate the model's ability to reconstruct and represent in latent space the recorded sounds, the differences in operating conditions and between different equipment. In the future, this should facilitate the classification of sounds, as well as the detection of anomalies and degradation patterns in the operation of the machinery.
* 10 pages, in Spanish language, 5 figures. Presented in Tecniacustica
2023 conference (Cuenca, Spain)
Via

Mar 01, 2023
Abstract:This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring. First-shot ASD does not allow systems to do machine-type dependent hyperparameter tuning or tool ensembling based on the performance metric calculated with the grand truth. To show benchmark performance for First-shot ASD, this paper proposes an anomaly sound detection system that works on the domain generalization task in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 2: "Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Technique" while complying with the First-shot requirements introduced in the DCASE 2023 Challenge Task 2 (DCASE2023T2). A simple autoencoder based implementation combined with selective Mahalanobis metric is implemented as a baseline system. The performance evaluation is conducted to set the target benchmark for the forthcoming DCASE2023T2. Source code of the baseline system will be available on GitHub: https://github.com/nttcslab/dcase2023_task2_baseline_ae .
* 5 pages, 2 figures
Via

May 05, 2023
Abstract:Although deep learning is the mainstream method in unsupervised anomalous sound detection, Gaussian Mixture Model (GMM) with statistical audio frequency representation as input can achieve comparable results with much lower model complexity and fewer parameters. Existing statistical frequency representations, e.g, the log-Mel spectrogram's average or maximum over time, do not always work well for different machines. This paper presents Time-Weighted Frequency Domain Representation (TWFR) with the GMM method (TWFR-GMM) for anomalous sound detection. The TWFR is a generalized statistical frequency domain representation that can adapt to different machine types, using the global weighted ranking pooling over time-domain. This allows GMM estimator to recognize anomalies, even under domain-shift conditions, as visualized with a Mahalanobis distance-based metric. Experiments on DCASE 2022 Challenge Task2 dataset show that our method has better detection performance than recent deep learning methods. TWFR-GMM is the core of our submission that achieved the 3rd place in DCASE 2022 Challenge Task2.
* To appear at ICASSP 2023
Via

Nov 12, 2021
Abstract:To develop a sound-monitoring system for machines, a method for detecting anomalous sound under domain shifts is proposed. A domain shift occurs when a machine's physical parameters change. Because a domain shift changes the distribution of normal sound data, conventional unsupervised anomaly detection methods can output false positives. To solve this problem, the proposed method constrains some latent variables of a normalizing flows (NF) model to represent physical parameters, which enables disentanglement of the factors of domain shifts and learning of a latent space that is invariant with respect to these domain shifts. Anomaly scores calculated from this domain-shift-invariant latent space are unaffected by such shifts, which reduces false positives and improves the detection performance. Experiments were conducted with sound data from a slide rail under different operation velocities. The results show that the proposed method disentangled the velocity to obtain a latent space that was invariant with respect to domain shifts, which improved the AUC by 13.2% for Glow with a single block and 2.6% for Glow with multiple blocks.
* 4 pages, 4 figures
Via
