Brain graphs, which model the structural and functional relationships between brain regions, are crucial in neuroscientific and clinical applications involving graph classification. However, dense brain graphs pose computational challenges including high runtime and memory usage and limited interpretability. In this paper, we investigate effective designs in Graph Neural Networks (GNNs) to sparsify brain graphs by eliminating noisy edges. While prior works remove noisy edges based on explainability or task-irrelevant properties, their effectiveness in enhancing performance with sparsified graphs is not guaranteed. Moreover, existing approaches often overlook collective edge removal across multiple graphs. To address these issues, we introduce an iterative framework to analyze different sparsification models. Our findings are as follows: (i) methods prioritizing interpretability may not be suitable for graph sparsification as they can degrade GNNs' performance in graph classification tasks; (ii) simultaneously learning edge selection with GNN training is more beneficial than post-training; (iii) a shared edge selection across graphs outperforms separate selection for each graph; and (iv) task-relevant gradient information aids in edge selection. Based on these insights, we propose a new model, Interpretable Graph Sparsification (IGS), which enhances graph classification performance by up to 5.1% with 55.0% fewer edges. The retained edges identified by IGS provide neuroscientific interpretations and are supported by well-established literature.
We investigate the question of whether the knowledge learned by graph neural networks (GNNs) from small graphs is generalizable to large graphs in the same domain. Prior works suggest that the distribution shift, particularly in the degree distribution, between graphs of different sizes can lead to performance degradation in the graph classification task. However, this may not be the case for biological datasets where the degrees are bounded and the distribution shift of degrees is small. Even with little degree distribution shift, our observations show that GNNs' performance on larger graphs from the same datasets still degrades, suggesting other causes. In fact, there has been a lack of exploration in real datasets to understand the types and properties of distribution shifts caused by various graph sizes. Furthermore, previous analyses of size generalizability mostly focus on the spatial domain. To fill these gaps, we take the spectral perspective and study the size generalizability of GNNs on biological data. We identify a distribution shift between small and large graphs in the eigenvalues of the normalized Laplacian/adjacency matrix, indicating a difference in the global node connectivity, which is found to be correlated with the node closeness centrality. We further find that despite of the variations in global connectivity, graphs of different sizes share similar local connectivity, which can be utilized to improve the size generalizability of GNNs. Based on our spectral insights and empirical observations, we propose a model-agnostic strategy, SIA, which uses size-irrelevant local structural features, i.e., the local closeness centrality of a node, to guide the learning process. Our empirical results demonstrate that our strategy improves the graph classification performance of various GNNs on small and large graphs when training with only small graphs.
Neural networks (NN) for single-view 3D reconstruction (SVR) have gained in popularity. Recent work points out that for SVR, most cutting-edge NNs have limited performance on reconstructing unseen objects because they rely primarily on recognition (i.e., classification-based methods) rather than shape reconstruction. To understand this issue in depth, we provide a systematic study on when and why NNs prefer recognition to reconstruction and vice versa. Our finding shows that a leading factor in determining recognition versus reconstruction is how dispersed the training data is. Thus, we introduce the dispersion score, a new data-driven metric, to quantify this leading factor and study its effect on NNs. We hypothesize that NNs are biased toward recognition when training images are more dispersed and training shapes are less dispersed. Our hypothesis is supported and the dispersion score is proved effective through our experiments on synthetic and benchmark datasets. We show that the proposed metric is a principal way to analyze reconstruction quality and provides novel information in addition to the conventional reconstruction score.
Graph classification has applications in bioinformatics, social sciences, automated fake news detection, web document classification, and more. In many practical scenarios, including web-scale applications, where labels are scarce or hard to obtain, unsupervised learning is a natural paradigm but it trades off performance. Recently, contrastive learning (CL) has enabled unsupervised computer vision models to compete well against supervised ones. Theoretical and empirical works analyzing visual CL frameworks find that leveraging large datasets and domain aware augmentations is essential for framework success. Interestingly, graph CL frameworks often report high performance while using orders of magnitude smaller data, and employing domain-agnostic augmentations (e.g., node or edge dropping, feature perturbations) that can corrupt the graphs' underlying properties. Motivated by these discrepancies, we seek to determine: (i) why existing graph CL frameworks perform well despite weak augmentations and limited data; and (ii) whether adhering to visual CL principles can improve performance on graph classification tasks. Through extensive analysis, we identify flawed practices in graph data augmentation and evaluation protocols that are commonly used in the graph CL literature, and propose improved practices and sanity checks for future research and applications. We show that on small benchmark datasets, the inductive bias of graph neural networks can significantly compensate for the limitations of existing frameworks. In case studies with relatively larger graph classification tasks, we find that commonly used domain-agnostic augmentations perform poorly, while adhering to principles in visual CL can significantly improve performance. For example, in graph-based document classification, which can be used for better web search, we show task-relevant augmentations improve accuracy by 20%.
Most graph neural networks (GNN) perform poorly in graphs where neighbors typically have different features/classes (heterophily) and when stacking multiple layers (oversmoothing). These two seemingly unrelated problems have been studied independently, but there is recent empirical evidence that solving one problem may benefit the other. In this work, going beyond empirical observations, we theoretically characterize the connections between heterophily and oversmoothing, both of which lead to indistinguishable node representations. By modeling the change in node representations during message propagation, we theoretically analyze the factors (e.g., degree, heterophily level) that make the representations of nodes from different classes indistinguishable. Our analysis highlights that (1) nodes with high heterophily and nodes with low heterophily and low degrees relative to their neighbors (degree discrepancy) trigger the oversmoothing problem, and (2) allowing "negative" messages between neighbors can decouple the heterophily and oversmoothing problems. Based on our insights, we design a model that addresses the discrepancy in features and degrees between neighbors by incorporating signed messages and learned degree corrections. Our experiments on 9 real networks show that our model achieves state-of-the-art performance under heterophily, and performs comparably to existing GNNs under low heterophily(homophily). It also effectively addresses oversmoothing and even benefits from multiple layers.
Federated learning promises to use the computational power of edge devices while maintaining user data privacy. Current frameworks, however, typically make the unrealistic assumption that the data stored on user devices come with ground truth labels, while the server has no data. In this work, we consider the more realistic scenario where the users have only unlabeled data and the server has a limited amount of labeled data. In this semi-supervised federated learning (ssfl) setting, the data distribution can be non-iid, in the sense of different distributions of classes at different users. We define a metric, $R$, to measure this non-iidness in class distributions. In this setting, we provide a thorough study on different factors that can affect the final test accuracy, including algorithm design (such as training objective), the non-iidness $R$, the communication period $T$, the number of users $K$, the amount of labeled data in the server $N_s$, and the number of users $C_k\leq K$ that communicate with the server in each communication round. We evaluate our ssfl framework on Cifar-10, SVHN, and EMNIST. Overall, we find that a simple consistency loss-based method, along with group normalization, achieves better generalization performance, even compared to previous supervised federated learning settings. Furthermore, we propose a novel grouping-based model average method to improve convergence efficiency, and we show that this can boost performance by up to 10.79% on EMNIST, compared to the non-grouping based method.
A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms. This is evidenced by their inability to generalize to data distributions that are outside of their restricted training sets, namely larger inputs and unseen data. We study these generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees. First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. We demonstrate that this is due to attention weights that lose fidelity with longer sequences, particularly when the input numbers are numerically similar. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. Second, to generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication. This allows the embedding to handle missing data by faithfully interpolating numbers not seen during training.
We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Most existing GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs -- ego- and neighbor-embedding separation, higher-order neighborhoods, and combination of intermediate representations -- that boost learning from the graph structure under heterophily, and combine them into a new graph convolutional neural network, H2GCN. Going beyond the traditional benchmarks with strong homophily, our empirical analysis on synthetic and real networks shows that, thanks to the identified designs, H2GCN has consistently strong performance across the full spectrum of low-to-high homophily, unlike competitive prior models without them.