Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yohei Kawaguchi

MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

May 27, 2022

Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, Yohei Kawaguchi

Figure 1 for MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Figure 2 for MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Figure 3 for MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Figure 4 for MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Abstract:We present a machine sound dataset to benchmark domain generalization techniques for anomalous sound detection (ASD). To handle performance degradation caused by domain shifts that are difficult to detect or too frequent to adapt, domain generalization techniques are preferred. However, currently available datasets have difficulties in evaluating these techniques, such as limited number of values for parameters that cause domain shifts (domain shift parameters). In this paper, we present the first ASD dataset for the domain generalization techniques, called MIMII DG. The dataset consists of five machine types and three domain shift scenarios for each machine type. We prepared at least two values for the domain shift parameters in the source domain. Also, we introduced domain shifts that can be difficult to notice. Experimental results using two baseline systems indicate that the dataset reproduces the domain shift scenarios and is useful for benchmarking domain generalization techniques.

Via

Access Paper or Ask Questions

Anomalous Sound Detection Based on Machine Activity Detection

Apr 15, 2022

Tomoya Nishida, Kota Dohi, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

Figure 1 for Anomalous Sound Detection Based on Machine Activity Detection

Figure 2 for Anomalous Sound Detection Based on Machine Activity Detection

Figure 3 for Anomalous Sound Detection Based on Machine Activity Detection

Figure 4 for Anomalous Sound Detection Based on Machine Activity Detection

Abstract:We have developed an unsupervised anomalous sound detection method for machine condition monitoring that utilizes an auxiliary task -- detecting when the target machine is active. First, we train a model that detects machine activity by using normal data with machine activity labels and then use the activity-detection error as the anomaly score for a given sound clip if we have access to the ground-truth activity labels in the inference phase. If these labels are not available, the anomaly score is calculated through outlier detection on the embedding vectors obtained by the activity-detection model. Solving this auxiliary task enables the model to learn the difference between the target machine sounds and similar background noise, which makes it possible to identify small deviations in the target sounds. Experimental results showed that the proposed method improves the anomaly-detection performance of the conventional method complementarily by means of an ensemble.

* 5 pages, 2 figures, 1 table

Via

Access Paper or Ask Questions

Environmental Sound Extraction Using Onomatopoeia

Dec 02, 2021

Yuki Okamoto, Shota Horiguchi, Masaaki Yamamoto, Keisuke Imoto, Yohei Kawaguchi

Figure 1 for Environmental Sound Extraction Using Onomatopoeia

Figure 2 for Environmental Sound Extraction Using Onomatopoeia

Figure 3 for Environmental Sound Extraction Using Onomatopoeia

Figure 4 for Environmental Sound Extraction Using Onomatopoeia

Abstract:Onomatopoeia, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeia to specify the target sound to be extracted. With this method, we estimate a time-frequency mask from an input mixture spectrogram and onomatopoeia by using U-Net architecture then extract the corresponding target sound by masking the spectrogram. Experimental results indicate that the proposed method can extract only the target sound corresponding to onomatopoeia and performs better than conventional methods that use sound-event classes to specify the target sound.

* Submitted to ICASSP2022

Via

Access Paper or Ask Questions

Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts

Nov 12, 2021

Kota Dohi, Takashi Endo, Yohei Kawaguchi

Figure 1 for Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts

Figure 2 for Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts

Figure 3 for Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts

Figure 4 for Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts

Abstract:To develop a sound-monitoring system for machines, a method for detecting anomalous sound under domain shifts is proposed. A domain shift occurs when a machine's physical parameters change. Because a domain shift changes the distribution of normal sound data, conventional unsupervised anomaly detection methods can output false positives. To solve this problem, the proposed method constrains some latent variables of a normalizing flows (NF) model to represent physical parameters, which enables disentanglement of the factors of domain shifts and learning of a latent space that is invariant with respect to these domain shifts. Anomaly scores calculated from this domain-shift-invariant latent space are unaffected by such shifts, which reduces false positives and improves the detection performance. Experiments were conducted with sound data from a slide rail under different operation velocities. The results show that the proposed method disentangled the velocity to obtain a latent space that was invariant with respect to domain shifts, which improved the AUC by 13.2% for Glow with a single block and 2.6% for Glow with multiple blocks.

* 4 pages, 4 figures

Via

Access Paper or Ask Questions

Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Oct 10, 2021

Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

Figure 1 for Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Figure 2 for Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Figure 3 for Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Figure 4 for Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Abstract:Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of the number and geometry of microphones and suitable for distributed microphone settings. We also propose a model adaptation method using only single-channel recordings. With simulated and real-recorded datasets, we demonstrated that the proposed method outperformed conventional EEND when a multi-channel input was given while maintaining comparable performance with a single-channel input. We also showed that the proposed method performed well even when spatial information is inoperative given multi-channel inputs, such as in hybrid meetings in which the utterances of multiple remote participants are played back from the same loudspeaker.

Via

Access Paper or Ask Questions

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Jul 04, 2021

Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki Takashima, Yohei Kawaguchi

Figure 1 for Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Figure 2 for Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Figure 3 for Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Figure 4 for Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Abstract:Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number of speakers is larger than the one observed during training. This is because its speaker counting relies on supervised learning. In this work, we introduce an unsupervised clustering process embedded in the attractor-based end-to-end diarization. We first split a sequence of frame-wise embeddings into short subsequences and then perform attractor-based diarization for each subsequence. Given subsequence-wise diarization results, inter-subsequence speaker correspondence is obtained by unsupervised clustering of the vectors computed from the attractors from all the subsequences. This makes it possible to produce diarization results of a large number of speakers for the whole recording even if the number of output speakers for each subsequence is limited. Experimental results showed that our method could produce accurate diarization results of an unseen number of speakers. Our method achieved 11.84 %, 28.33 %, and 19.49 % on the CALLHOME, DIHARD II, and DIHARD III datasets, respectively, each of which is better than the conventional end-to-end diarization methods.

Via

Access Paper or Ask Questions

Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Jun 08, 2021

Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, Takashi Endo

Figure 1 for Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Figure 2 for Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Abstract:We present the task description and discussion on the results of the DCASE 2021 Challenge Task 2. Last year, we organized unsupervised anomalous sound detection (ASD) task; identifying whether the given sound is normal or anomalous without anomalous training data. In this year, we organize an advanced unsupervised ASD task under domain-shift conditions which focuses on the inevitable problem for the practical use of ASD systems. The main challenge of this task is to detect unknown anomalous sounds where the acoustic characteristics of the training and testing samples are different, i.e. domain-shifted. This problem is frequently occurs due to changes in seasons, manufactured products, and/or environmental noise. After the challenge submission deadline, we will add challenge results and analysis of the submissions.

* Submitted to DCASE 2021 Workshop. arXiv admin note: text overlap with arXiv:2006.05822

Via

Access Paper or Ask Questions

MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions

May 07, 2021

Ryo Tanabe, Harsh Purohit, Kota Dohi, Takashi Endo, Yuki Nikaido, Toshiki Nakamura, Yohei Kawaguchi

Figure 1 for MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions

Figure 2 for MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions

Figure 3 for MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions

Figure 4 for MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions

Abstract:In this paper, we introduce a new dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental conditions (MIMII DUE). Conventional methods for anomalous sound detection face challenges in practice because the distribution of features changes between the training and operational phases (called domain shift) due to some real-world factors. To check the robustness against domain shifts, we need a dataset with domain shifts, but such a dataset does not exist so far. The new dataset consists of normal and abnormal operating sounds of industrial machines of five different types under two different operational/environmental conditions (source domain and target domain) independent of normal/abnormal, with domain shifts occurring between the two domains. Experimental results show significant performance differences between the source and target domains, and the dataset contains the domain shifts. These results indicate that the dataset will be helpful to check the robustness against domain shifts. The dataset is a subset of the dataset for DCASE 2021 Challenge Task 2 and freely available for download at https://zenodo.org/record/4740355

* 5 pages, under review for WASPAA 2021, disambiguation (in 2.2, sound proof room -> sound isolation booth, anechoic room -> anechoic chamber)

Via

Access Paper or Ask Questions

Flow-based Self-supervised Density Estimation for Anomalous Sound Detection

Mar 16, 2021

Kota Dohi, Takashi Endo, Harsh Purohit, Ryo Tanabe, Yohei Kawaguchi

Figure 1 for Flow-based Self-supervised Density Estimation for Anomalous Sound Detection

Figure 2 for Flow-based Self-supervised Density Estimation for Anomalous Sound Detection

Figure 3 for Flow-based Self-supervised Density Estimation for Anomalous Sound Detection

Figure 4 for Flow-based Self-supervised Density Estimation for Anomalous Sound Detection

Abstract:To develop a machine sound monitoring system, a method for detecting anomalous sound is proposed. Exact likelihood estimation using Normalizing Flows is a promising technique for unsupervised anomaly detection, but it can fail at out-of-distribution detection since the likelihood is affected by the smoothness of the data. To improve the detection performance, we train the model to assign higher likelihood to target machine sounds and lower likelihood to sounds from other machines of the same machine type. We demonstrate that this enables the model to incorporate a self-supervised classification-based approach. Experiments conducted using the DCASE 2020 Challenge Task2 dataset showed that the proposed method improves the AUC by 4.6% on average when using Masked Autoregressive Flow (MAF) and by 5.8% when using Glow, which is a significant improvement over the previous method.

* 5 pages, 1 figure, accepted in ICASSP 2021

Via

Access Paper or Ask Questions

Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Sep 25, 2020

Harsh Purohit, Ryo Tanabe, Takashi Endo, Kaori Suefusa, Yuki Nikaido, Yohei Kawaguchi

Figure 1 for Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Figure 2 for Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Figure 3 for Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Figure 4 for Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Abstract:Failures or breakdowns in factory machinery can be costly to companies, so there is an increasing demand for automatic machine inspection. Existing approaches to acoustic signal-based unsupervised anomaly detection, such as those using a deep autoencoder (DA) or Gaussian mixture model (GMM), have poor anomaly-detection performance. In this work, we propose a new method based on a deep autoencoding Gaussian mixture model with hyper-parameter optimization (DAGMM-HO). In our method, the DAGMM-HO applies the conventional DAGMM to the audio domain for the first time, with the idea that its total optimization on reduction of dimensions and statistical modelling will improve the anomaly-detection performance. In addition, the DAGMM-HO solves the hyper-parameter sensitivity problem of the conventional DAGMM by performing hyper-parameter optimization based on the gap statistic and the cumulative eigenvalues. Our evaluation of the proposed method with experimental data of the industrial fans showed that it significantly outperforms previous approaches and achieves up to a 20% improvement based on the standard AUC score.

* 5 pages, to appear in DCASE 2020 Workshop

Via

Access Paper or Ask Questions