Alert button
Picture for Junbin Gao

Junbin Gao

Alert button

Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond

Sep 13, 2023
Zhiqi Shao, Dai Shi, Andi Han, Yi Guo, Qibin Zhao, Junbin Gao

Figure 1 for Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond
Figure 2 for Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond
Figure 3 for Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond
Figure 4 for Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond

Graph Neural Networks (GNNs) have emerged as one of the leading approaches for machine learning on graph-structured data. Despite their great success, critical computational challenges such as over-smoothing, over-squashing, and limited expressive power continue to impact the performance of GNNs. In this study, inspired from the time-reversal principle commonly utilized in classical and quantum physics, we reverse the time direction of the graph heat equation. The resulted reversing process yields a class of high pass filtering functions that enhance the sharpness of graph node features. Leveraging this concept, we introduce the Multi-Scaled Heat Kernel based GNN (MHKG) by amalgamating diverse filtering functions' effects on node features. To explore more flexible filtering conditions, we further generalize MHKG into a model termed G-MHKG and thoroughly show the roles of each element in controlling over-smoothing, over-squashing and expressive power. Notably, we illustrate that all aforementioned issues can be characterized and analyzed via the properties of the filtering functions, and uncover a trade-off between over-smoothing and over-squashing: enhancing node feature sharpness will make model suffer more from over-squashing, and vice versa. Furthermore, we manipulate the time again to show how G-MHKG can handle both two issues under mild conditions. Our conclusive experiments highlight the effectiveness of proposed models. It surpasses several GNN baseline models in performance across graph datasets characterized by both homophily and heterophily.

Viaarxiv icon

Bregman Graph Neural Network

Sep 12, 2023
Jiayu Zhai, Lequan Lin, Dai Shi, Junbin Gao

Numerous recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as an optimization problem with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this paper, we propose a novel bilevel optimization framework for GNNs inspired by the notion of Bregman distance. We demonstrate that the GNN layer proposed accordingly can effectively mitigate the over-smoothing issue by introducing a mechanism reminiscent of the "skip connection". We validate our theoretical results through comprehensive empirical studies in which Bregman-enhanced GNNs outperform their original counterparts in both homophilic and heterophilic graphs. Furthermore, our experiments also show that Bregman GNNs can produce more robust learning accuracy even when the number of layers is high, suggesting the effectiveness of the proposed method in alleviating the over-smoothing issue.

Viaarxiv icon

How Curvature Enhance the Adaptation Power of Framelet GCNs

Jul 19, 2023
Dai Shi, Yi Guo, Zhiqi Shao, Junbin Gao

Figure 1 for How Curvature Enhance the Adaptation Power of Framelet GCNs
Figure 2 for How Curvature Enhance the Adaptation Power of Framelet GCNs
Figure 3 for How Curvature Enhance the Adaptation Power of Framelet GCNs
Figure 4 for How Curvature Enhance the Adaptation Power of Framelet GCNs

Graph neural network (GNN) has been demonstrated powerful in modeling graph-structured data. However, despite many successful cases of applying GNNs to various graph classification and prediction tasks, whether the graph geometrical information has been fully exploited to enhance the learning performance of GNNs is not yet well understood. This paper introduces a new approach to enhance GNN by discrete graph Ricci curvature. Specifically, the graph Ricci curvature defined on the edges of a graph measures how difficult the information transits on one edge from one node to another based on their neighborhoods. Motivated by the geometric analogy of Ricci curvature in the graph setting, we prove that by inserting the curvature information with different carefully designed transformation function $\zeta$, several known computational issues in GNN such as over-smoothing can be alleviated in our proposed model. Furthermore, we verified that edges with very positive Ricci curvature (i.e., $\kappa_{i,j} \approx 1$) are preferred to be dropped to enhance model's adaption to heterophily graph and one curvature based graph edge drop algorithm is proposed. Comprehensive experiments show that our curvature-based GNN model outperforms the state-of-the-art baselines in both homophily and heterophily graph datasets, indicating the effectiveness of involving graph geometric information in GNNs.

Viaarxiv icon

Frameless Graph Knowledge Distillation

Jul 13, 2023
Dai Shi, Zhiqi Shao, Yi Guo, Junbin Gao

Figure 1 for Frameless Graph Knowledge Distillation
Figure 2 for Frameless Graph Knowledge Distillation
Figure 3 for Frameless Graph Knowledge Distillation
Figure 4 for Frameless Graph Knowledge Distillation

Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) to accelerate the model's inference speed via student models. However, many existing KD-based GNNs utilize MLP as a universal approximator in the student model to imitate the teacher model's process without considering the graph knowledge from the teacher model. In this work, we provide a KD-based framework on multi-scaled GNNs, known as graph framelet, and prove that by adequately utilizing the graph knowledge in a multi-scaled manner provided by graph framelet decomposition, the student model is capable of adapting both homophilic and heterophilic graphs and has the potential of alleviating the over-squashing issue with a simple yet effectively graph surgery. Furthermore, we show how the graph knowledge supplied by the teacher is learned and digested by the student model via both algebra and geometry. Comprehensive experiments show that our proposed model can generate learning accuracy identical to or even surpass the teacher model while maintaining the high speed of inference.

Viaarxiv icon

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

Jul 05, 2023
Qijie Ding, Jie Yin, Daokun Zhang, Junbin Gao

Figure 1 for Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment
Figure 2 for Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment
Figure 3 for Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment
Figure 4 for Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) The Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration. (2) The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analyse. The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.

Viaarxiv icon

Variational Counterfactual Prediction under Runtime Domain Corruption

Jun 23, 2023
Hechuan Wen, Tong Chen, Li Kheng Chai, Shazia Sadiq, Junbin Gao, Hongzhi Yin

Figure 1 for Variational Counterfactual Prediction under Runtime Domain Corruption
Figure 2 for Variational Counterfactual Prediction under Runtime Domain Corruption
Figure 3 for Variational Counterfactual Prediction under Runtime Domain Corruption
Figure 4 for Variational Counterfactual Prediction under Runtime Domain Corruption

To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets.

Viaarxiv icon

Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning

Jun 06, 2023
Peggy Tang, Junbin Gao, Lei Zhang, Zhiyong Wang

Figure 1 for Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Figure 2 for Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Figure 3 for Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning
Figure 4 for Efficient and Interpretable Compressive Text Summarisation with Unsupervised Dual-Agent Reinforcement Learning

Recently, compressive text summarisation offers a balance between the conciseness issue of extractive summarisation and the factual hallucination issue of abstractive summarisation. However, most existing compressive summarisation methods are supervised, relying on the expensive effort of creating a new training dataset with corresponding compressive summaries. In this paper, we propose an efficient and interpretable compressive summarisation method that utilises unsupervised dual-agent reinforcement learning to optimise a summary's semantic coverage and fluency by simulating human judgment on summarisation quality. Our model consists of an extractor agent and a compressor agent, and both agents have a multi-head attentional pointer-based structure. The extractor agent first chooses salient sentences from a document, and then the compressor agent compresses these extracted sentences by selecting salient words to form a summary without using reference summaries to compute the summary reward. To our best knowledge, this is the first work on unsupervised compressive summarisation. Experimental results on three widely used datasets (e.g., Newsroom, CNN/DM, and XSum) show that our model achieves promising performance and a significant improvement on Newsroom in terms of the ROUGE metric, as well as interpretability of semantic coverage of summarisation results.

* The 4th Workshop on Simple and Efficient Natural Language Processing (SustaiNLP 2023), co-located with ACL 2023 
Viaarxiv icon

Revisiting Generalized p-Laplacian Regularized Framelet GCNs: Convergence, Energy Dynamic and Training with Non-Linear Diffusion

May 25, 2023
Dai Shi, Zhiqi Shao, Yi Guo, Qibin Zhao, Junbin Gao

Figure 1 for Revisiting Generalized p-Laplacian Regularized Framelet GCNs: Convergence, Energy Dynamic and Training with Non-Linear Diffusion

This work presents a comprehensive theoretical analysis of graph p-Laplacian based framelet network (pL-UFG) to establish a solid understanding of its properties. We begin by conducting a convergence analysis of the p-Laplacian based implicit layer integrated after the framelet convolution, providing insights into the asymptotic behavior of pL-UFG. By exploring the generalized Dirichlet energy of pL-UFG, we demonstrate that the Dirichlet energy remains non-zero, ensuring the avoidance of over-smoothing issues in pL-UFG as it approaches convergence. Furthermore, we elucidate the dynamic energy perspective through which the implicit layer in pL-UFG synergizes with graph framelets, enhancing the model's adaptability to both homophilic and heterophilic data. Remarkably, we establish that the implicit layer can be interpreted as a generalized non-linear diffusion process, enabling training using diverse schemes. These multifaceted analyses lead to unified conclusions that provide novel insights for understanding and implementing pL-UFG, contributing to advancements in the field of graph-based deep learning.

Viaarxiv icon

Diffusion Models for Time Series Applications: A Survey

May 01, 2023
Lequan Lin, Zhengkun Li, Ruikun Li, Xuliang Li, Junbin Gao

Figure 1 for Diffusion Models for Time Series Applications: A Survey

Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data, diffusion models are widely used in image, video, and text synthesis nowadays. In recent years, the concept of diffusion has been extended to time series applications, and many powerful models have been developed. Considering the deficiency of a methodical summary and discourse on these models, we provide this survey as an elementary resource for new researchers in this area and also an inspiration to motivate future research. For better understanding, we include an introduction about the basics of diffusion models. Except for this, we primarily focus on diffusion-based methods for time series forecasting, imputation, and generation, and present them respectively in three individual sections. We also compare different methods for the same application and highlight their connections if applicable. Lastly, we conclude the common limitation of diffusion-based methods and highlight potential future research directions.

Viaarxiv icon