We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: "First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring". The main goal is to enable rapid deployment of ASD systems for new kinds of machines using only a few normal samples, without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned hyperparameters for each machine type, as the development and evaluation datasets had the same machine types. However, collecting normal and anomalous data as the development dataset can be infeasible in practice. In 2023 Task 2, we focus on solving first-shot problem, which is the challenge of training a model on a few machines of a completely novel machine type. Specifically, (i) each machine type has only one section, and (ii) machine types in the development and evaluation datasets are completely different. We will add challenge results and analysis of the submissions after the challenge submission deadline.
Semi-supervised anomaly detection~(SSAD) is a task where normal data and a limited number of anomalous data are available for training. In practical situations, SSAD methods suffer adapting to domain shifts, since anomalous data are unlikely to be available for the target domain in the training phase. To solve this problem, we propose a domain adaptation method for SSAD where no anomalous data are available for the target domain. First, we introduce a domain-adversarial network to a variational auto-encoder-based SSAD model to obtain domain-invariant latent variables. Since the decoder cannot reconstruct the original data solely from domain-invariant latent variables, we conditioned the decoder on the domain label. To compensate for the missing anomalous data of the target domain, we introduce an importance sampling-based weighted loss function that approximates the ideal loss function. Experimental results indicate that the proposed method helps adapt SSAD models to the target domain when no anomalous data are available for the target domain.
We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge Task 2: "Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques". Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domain performs poorly for a target domain. In DCASE 2021 Challenge Task 2, we organized an ASD task for handling domain shifts. In this task, it was assumed that the occurrences of domain shifts are known. However, in practice, the domain of each sample may not be given, and the domain shifts can occur implicitly. In 2022 Task 2, we focus on domain generalization techniques that detects anomalies regardless of the domain shifts. Specifically, the domain of each sample is not given in the test data and only one threshold is allowed for all domains. We will add challenge results and analysis of the submissions after the challenge submission deadline.
We present a machine sound dataset to benchmark domain generalization techniques for anomalous sound detection (ASD). To handle performance degradation caused by domain shifts that are difficult to detect or too frequent to adapt, domain generalization techniques are preferred. However, currently available datasets have difficulties in evaluating these techniques, such as limited number of values for parameters that cause domain shifts (domain shift parameters). In this paper, we present the first ASD dataset for the domain generalization techniques, called MIMII DG. The dataset consists of five machine types and three domain shift scenarios for each machine type. We prepared at least two values for the domain shift parameters in the source domain. Also, we introduced domain shifts that can be difficult to notice. Experimental results using two baseline systems indicate that the dataset reproduces the domain shift scenarios and is useful for benchmarking domain generalization techniques.
We have developed an unsupervised anomalous sound detection method for machine condition monitoring that utilizes an auxiliary task -- detecting when the target machine is active. First, we train a model that detects machine activity by using normal data with machine activity labels and then use the activity-detection error as the anomaly score for a given sound clip if we have access to the ground-truth activity labels in the inference phase. If these labels are not available, the anomaly score is calculated through outlier detection on the embedding vectors obtained by the activity-detection model. Solving this auxiliary task enables the model to learn the difference between the target machine sounds and similar background noise, which makes it possible to identify small deviations in the target sounds. Experimental results showed that the proposed method improves the anomaly-detection performance of the conventional method complementarily by means of an ensemble.
A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been extensively investigated in recent years; however, the current IR datasets are not applicable to validating and comparing these methods because of the low spatial resolution of measurement points. MeshRIR consists of IRs measured at positions obtained by finely discretizing a spatial region. Two subdatasets are currently available: one consists of IRs in a three-dimensional cuboidal region from a single source, and the other consists of IRs in a two-dimensional square region from an array of 32 sources. Therefore, MeshRIR is suitable for evaluating sound field analysis and synthesis methods. This dataset is freely available at \url{https://sh01k.github.io/MeshRIR/} with some codes of sample applications.
As the labor force decreases, the demand for labor-saving automatic anomalous sound detection technology that conducts maintenance of industrial equipment has grown. Conventional approaches detect anomalies based on the reconstruction errors of an autoencoder. However, when the target machine sound is non-stationary, a reconstruction error tends to be large independent of an anomaly, and its variations increased because of the difficulty of predicting the edge frames. To solve the issue, we propose an approach to anomalous detection in which the model utilizes multiple frames of a spectrogram whose center frame is removed as an input, and it predicts an interpolation of the removed frame as an output. Rather than predicting the edge frames, the proposed approach makes the reconstruction error consistent with the anomaly. Experimental results showed that the proposed approach achieved 27% improvement based on the standard AUC score, especially against non-stationary machinery sounds.