Abstract:Dynamic functional connectivity captures time-varying brain states for better neuropsychiatric diagnosis and spatio-temporal interpretability, i.e., identifying when discriminative disease signatures emerge and where they reside in the connectivity topology. Reliable interpretability faces major challenges: diagnostic signals are often subtle and sparsely distributed across both time and topology, while nuisance fluctuations and non-diagnostic connectivities are pervasive. To address these issues, we propose BrainSTR, a spatio-temporal contrastive learning framework for interpretable dynamic brain network modeling. BrainSTR learns state-consistent phase boundaries via a data-driven Adaptive Phase Partition module, identifies diagnostically critical phases with attention, and extracts disease-related connectivity within each phase using an Incremental Graph Structure Generator regularized by binarization, temporal smoothness, and sparsity. Then, we introduce a spatio-temporal supervised contrastive learning approach that leverages diagnosis-relevant spatio-temporal patterns to refine the similarity metric between samples and capture more discriminative spatio-temporal features, thereby constructing a well-structured semantic space for coherent and interpretable representations. Experiments on ASD, BD, and MDD validate the effectiveness of BrainSTR, and the discovered critical phases and subnetworks provide interpretable evidence consistent with prior neuroimaging findings. Our code: https://anonymous.4open.science/r/BrainSTR1.




Abstract:In this paper, we propose a model for bird sound event detection that focuses on a small number of training samples within the everyday long-tail distribution. As a result, we investigate bird sound detection using the few-shot learning paradigm. By integrating channel and spatial attention mechanisms, improved feature representations can be learned from few-shot training datasets. We develop a Metric Channel-Spatial Network model by incorporating a Channel Spatial Squeeze-Excitation block into the prototype network, combining it with these attention mechanisms. We evaluate the Metric Channel Spatial Network model on the DCASE 2022 Take5 dataset benchmark, achieving an F-measure of 66.84% and a PSDS of 58.98%. Our experiment demonstrates that the combination of channel and spatial attention mechanisms effectively enhances the performance of bird sound classification and detection.