Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning the Spectrogram Temporal Resolution for Audio Classification

Oct 05, 2022

Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

Figure 1 for Learning the Spectrogram Temporal Resolution for Audio Classification

Figure 2 for Learning the Spectrogram Temporal Resolution for Audio Classification

Figure 3 for Learning the Spectrogram Temporal Resolution for Audio Classification

Figure 4 for Learning the Spectrogram Temporal Resolution for Audio Classification

Share this with someone who'll enjoy it:

Abstract:The audio spectrogram is a time-frequency representation that has been widely used for audio classification. The temporal resolution of a spectrogram depends on hop size. Previous works generally assume the hop size should be a constant value such as ten milliseconds. However, a fixed hop size or resolution is not always optimal for different types of sound. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution learning to improve the performance of audio classification models. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier, and can be end-to-end optimized. We evaluate DiffRes on the mel-spectrogram, followed by state-of-the-art classifier backbones, and apply it to five different subtasks. Compared with using the fixed-resolution mel-spectrogram, the DiffRes-based method can achieve the same or better classification accuracy with at least 25% fewer temporal dimensions on the feature level, which alleviates the computational cost at the same time. Starting from a high-temporal-resolution spectrogram such as one-millisecond hop size, we show that DiffRes can improve classification accuracy with the same computational complexity.

* Under review. Code open-sourced at https://github.com/haoheliu/diffres-python

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Learning the Spectrogram Temporal Resolution for Audio Classification

Paper and Code