Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anbai Jiang

Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Aug 17, 2025

Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, Yanmin Qian

Figure 1 for Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Figure 2 for Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Figure 3 for Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Figure 4 for Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection

Abstract:Machine anomalous sound detection (ASD) is a valuable technique across various applications. However, its generalization performance is often limited due to challenges in data collection and the complexity of acoustic environments. Inspired by the success of large pre-trained models in numerous fields, this paper introduces a robust ASD model that leverages self-supervised pre-trained models trained on large-scale speech and audio datasets. Although there are inconsistencies between the pre-training datasets and the ASD task, our findings indicate that pre-training still provides substantial benefits for ASD. To mitigate overfitting and retain learned knowledge when fine-tuning with limited data, we explore Fully-Connected Low-Rank Adaptation (LoRA) as an alternative to full fine-tuning. Additionally, we propose a Machine-aware Group Adapter module, which enables the model to capture differences between various machines within a unified framework, thereby enhancing the generalization performance of ASD systems. To address the challenge of missing attribute labels, we design a novel objective function that dynamically clusters unattributed data using vector quantization and optimizes through a dual-level contrastive learning loss. The proposed methods are evaluated on all benchmark datasets, including the DCASE 2020-2024 five ASD challenges, and the experimental results show significant improvements of our new approach and demonstrate the effectiveness of our proposed strategies.

* Accepted by TASLP. 15 pages, 7 figures;

Via

Access Paper or Ask Questions

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

Jul 22, 2025

Pingyi Fan, Anbai Jiang, Shuwei Zhang, Zhiqiang Lv, Bing Han, Xinhu Zheng, Wenrui Liang, Junjie Li, Wei-Qiang Zhang, Yanmin Qian(+3 more)

Abstract:With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scaling law. However, we argue that the M5 signals can be modeled in a unified manner due to the intrinsic similarity. As a result, we propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To support arbitrary sampling rates, FISHER considers the increment of sampling rate as the concatenation of sub-band information. Specifically, FISHER takes the STFT sub-band as the modeling unit and adopts a teacher student SSL framework for pre-training. We also develop the RMIS benchmark, which evaluates the representations of M5 industrial signals on multiple health management tasks. Compared with top SSL models, FISHER showcases versatile and outstanding capabilities with a general performance gain up to 5.03%, along with much more efficient scaling curves. We also investigate the scaling law on downstream tasks and derive potential avenues for future works. FISHER is now open-sourced on https://github.com/jianganbai/FISHER

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Sep 11, 2024

Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang

Figure 1 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Figure 2 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Figure 3 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Figure 4 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Abstract:Anomalous Sound Detection (ASD) has gained significant interest through the application of various Artificial Intelligence (AI) technologies in industrial settings. Though possessing great potential, ASD systems can hardly be readily deployed in real production sites due to the generalization problem, which is primarily caused by the difficulty of data collection and the complexity of environmental factors. This paper introduces a robust ASD model that leverages audio pre-trained models. Specifically, we fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy. Additionally, we investigate the impact of utilizing Low-Rank Adaptation (LoRA) tuning instead of full fine-tuning to address the problem of limited data for fine-tuning. Our experiments on the DCASE2023 Task 2 dataset establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models, including top-tier traditional convolutional networks and speech pre-trained models, which demonstrates the effectiveness of audio pre-trained models with LoRA tuning. Ablation studies are also conducted to showcase the efficacy of the proposed scheme.

Via

Access Paper or Ask Questions

CoopASD: Cooperative Machine Anomalous Sound Detection with Privacy Concerns

Aug 27, 2024

Anbai Jiang, Yuchen Shi, Pingyi Fan, Wei-Qiang Zhang, Jia Liu

Figure 1 for CoopASD: Cooperative Machine Anomalous Sound Detection with Privacy Concerns

Figure 2 for CoopASD: Cooperative Machine Anomalous Sound Detection with Privacy Concerns

Figure 3 for CoopASD: Cooperative Machine Anomalous Sound Detection with Privacy Concerns

Figure 4 for CoopASD: Cooperative Machine Anomalous Sound Detection with Privacy Concerns

Abstract:Machine anomalous sound detection (ASD) has emerged as one of the most promising applications in the Industrial Internet of Things (IIoT) due to its unprecedented efficacy in mitigating risks of malfunctions and promoting production efficiency. Previous works mainly investigated the machine ASD task under centralized settings. However, developing the ASD system under decentralized settings is crucial in practice, since the machine data are dispersed in various factories and the data should not be explicitly shared due to privacy concerns. To enable these factories to cooperatively develop a scalable ASD model while preserving their privacy, we propose a novel framework named CoopASD, where each factory trains an ASD model on its local dataset, and a central server aggregates these local models periodically. We employ a pre-trained model as the backbone of the ASD model to improve its robustness and develop specialized techniques to stabilize the model under a completely non-iid and domain shift setting. Compared with previous state-of-the-art (SOTA) models trained in centralized settings, CoopASD showcases competitive results with negligible degradation of 0.08%. We also conduct extensive ablation studies to demonstrate the effectiveness of CoopASD.

* Accepted by GLOBECOM 2024

Via

Access Paper or Ask Questions

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Jun 17, 2024

Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

Figure 1 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Figure 2 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Figure 3 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Figure 4 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Abstract:Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, resulting in inconsistency in data and architecture. Thus, we propose AnoPatch which utilizes a ViT backbone pre-trained on AudioSet and fine-tunes it on machine audio. It is believed that machine audio is more related to audio datasets than speech datasets, and modeling it from patch level suits the sparsity of machine audio. As a result, AnoPatch showcases state-of-the-art (SOTA) performances on the DCASE 2020 ASD dataset and the DCASE 2023 ASD dataset. We also compare multiple pre-trained models and empirically demonstrate that better consistency yields considerable improvement.

* Accepted by INTERSPEECH 2024

Via

Access Paper or Ask Questions

Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Mar 31, 2023

Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu

Figure 1 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Figure 2 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Figure 3 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Figure 4 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Abstract:Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input spectrograms. It is pointed out that the denoising nature of reconstruction deprecates its capacity. Thus, the discriminator is redesigned to aid the generator during both training stage and detection stage. The performance of AEGAN-AD on the dataset of DCASE 2022 Challenge TASK 2 demonstrates the state-of-the-art result on five machine types. A novel anomaly localization method is also investigated. Source code available at: www.github.com/jianganbai/AEGAN-AD

* Accepted by ICASSP 2023

Via

Access Paper or Ask Questions