This paper proposed a model for bird sound detection, which belongs to a small sample of categories in the every day long tail distribution. Therefore, we study bird sounds detection using the few-shot learning paradigm. By combining channel and spatial attention mechanisms, better feature representations can be learned from few-shot training datasets. We construct a Metric Channel-Spatial Network model by merging a Channel Spatial SE block into the prototype network to combine it with these attention mechanisms. We then run the Metric Channel Spatial Network model on the benchmark of DCASE 2022 Take5 dataset and obtain an F-measure of $66.84\%$ and PSDS of $58.98\%$. The experiment demonstrates the combination of channel and spatial attention mechanisms can effectively improve the performance of bird sound classification and detection.
* 2023 Asia Pacific Signal and Information Processing Association
Annual Summit and Conference