Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chuan Shi

FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization

Mar 19, 2024

Cheng Yang, Jixi Liu, Yunhe Yan, Chuan Shi

Figure 1 for FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization

Figure 2 for FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization

Figure 3 for FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization

Figure 4 for FairSIN: Achieving Fairness in Graph Neural Networks through Sensitive Information Neutralization

Abstract:Despite the remarkable success of graph neural networks (GNNs) in modeling graph-structured data, like other machine learning models, GNNs are also susceptible to making biased predictions based on sensitive attributes, such as race and gender. For fairness consideration, recent state-of-the-art (SOTA) methods propose to filter out sensitive information from inputs or representations, e.g., edge dropping or feature masking. However, we argue that such filtering-based strategies may also filter out some non-sensitive feature information, leading to a sub-optimal trade-off between predictive performance and fairness. To address this issue, we unveil an innovative neutralization-based paradigm, where additional Fairness-facilitating Features (F3) are incorporated into node features or representations before message passing. The F3 are expected to statistically neutralize the sensitive bias in node representations and provide additional nonsensitive information. We also provide theoretical explanations for our rationale, concluding that F3 can be realized by emphasizing the features of each node's heterogeneous neighbors (neighbors with different sensitive attributes). We name our method as FairSIN, and present three implementation variants from both data-centric and model-centric perspectives. Experimental results on five benchmark datasets with three different GNN backbones show that FairSIN significantly improves fairness metrics while maintaining high prediction accuracies.

Via

Access Paper or Ask Questions

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Mar 14, 2024

Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su

Figure 1 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Figure 2 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Figure 3 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Figure 4 for BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Abstract:Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences. One potential solution for the long sequence problem is to utilize distributed clusters to parallelize the computation of attention modules across multiple devices (e.g., GPUs). However, adopting a distributed approach inevitably introduces extra memory overheads to store local attention results and incurs additional communication costs to aggregate local results into global ones. In this paper, we propose a distributed attention framework named ``BurstAttention'' to optimize memory access and communication operations at both the global cluster and local device levels. In our experiments, we compare BurstAttention with other competitive distributed attention solutions for long sequence processing. The experimental results under different length settings demonstrate that BurstAttention offers significant advantages for processing long sequences compared with these competitive baselines, reducing 40% communication overheads and achieving 2 X speedup during training 32K sequence length on 8 X A100.

* 13 pages, 7 figures

Via

Access Paper or Ask Questions

Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

Mar 06, 2024

Donglin Xia, Xiao Wang, Nian Liu, Chuan Shi

Figure 1 for Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

Figure 2 for Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

Figure 3 for Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

Figure 4 for Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

Abstract:Graph neural networks (GNNs) have become increasingly popular in modeling graph-structured data due to their ability to learn node representations by aggregating local structure information. However, it is widely acknowledged that the test graph structure may differ from the training graph structure, resulting in a structure shift. In this paper, we experimentally find that the performance of GNNs drops significantly when the structure shift happens, suggesting that the learned models may be biased towards specific structure patterns. To address this challenge, we propose the Cluster Information Transfer (CIT) mechanism (Code available at https://github.com/BUPT-GAMMA/CITGNN), which can learn invariant representations for GNNs, thereby improving their generalization ability to various and unknown test graphs with structure shift. The CIT mechanism achieves this by combining different cluster information with the nodes while preserving their cluster-independent information. By generating nodes across different clusters, the mechanism significantly enhances the diversity of the nodes and helps GNNs learn the invariant representations. We provide a theoretical analysis of the CIT mechanism, showing that the impact of changing clusters during structure shift can be mitigated after transfer. Additionally, the proposed mechanism is a plug-in that can be easily used to improve existing GNNs. We comprehensively evaluate our proposed method on three typical structure shift scenarios, demonstrating its effectiveness in enhancing GNNs' performance.

Via

Access Paper or Ask Questions

Minimum Topology Attacks for Graph Neural Networks

Mar 05, 2024

Mengmei Zhang, Xiao Wang, Chuan Shi, Lingjuan Lyu, Tianchi Yang, Junping Du

Figure 1 for Minimum Topology Attacks for Graph Neural Networks

Figure 2 for Minimum Topology Attacks for Graph Neural Networks

Figure 3 for Minimum Topology Attacks for Graph Neural Networks

Figure 4 for Minimum Topology Attacks for Graph Neural Networks

Abstract:With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received significant attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial perturbations within a fixed budget for target node. However, considering the varied robustness of each node, there is an inevitable dilemma caused by the fixed budget, i.e., no successful perturbation is found when the budget is relatively small, while if it is too large, the yielding redundant perturbations will hurt the invisibility. To break this dilemma, we propose a new type of topology attack, named minimum-budget topology attack, aiming to adaptively find the minimum perturbation sufficient for a successful attack on each node. To this end, we propose an attack model, named MiBTack, based on a dynamic projected gradient descent algorithm, which can effectively solve the involving non-convex constraint optimization on discrete topology. Extensive results on three GNNs and four real-world datasets show that MiBTack can successfully lead all target nodes misclassified with the minimum perturbation edges. Moreover, the obtained minimum budget can be used to measure node robustness, so we can explore the relationships of robustness, topology, and uncertainty for nodes, which is beyond what the current fixed-budget topology attacks can offer.

* Published on WWW 2023. Proceedings of the ACM Web Conference 2023

Via

Access Paper or Ask Questions

GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Feb 28, 2024

Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, Chuan Shi

Figure 1 for GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Figure 2 for GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Figure 3 for GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Figure 4 for GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks

Abstract:Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks. While the idea is less explored in the graph domain, despite the availability of numerous powerful graph models (GMs), they are restricted to tasks in a pre-defined form. Although several methods applying LLMs to graphs have been proposed, they fail to simultaneously handle the pre-defined and open-ended tasks, with LLM as a node feature enhancer or as a standalone predictor. To break this dilemma, we propose to bridge the pretrained GM and LLM by a Translator, named GraphTranslator, aiming to leverage GM to handle the pre-defined tasks effectively and utilize the extended interface of LLMs to offer various open-ended tasks for GM. To train such Translator, we propose a Producer capable of constructing the graph-text alignment data along node information, neighbor information and model information. By translating node representation into tokens, GraphTranslator empowers an LLM to make predictions based on language instructions, providing a unified perspective for both pre-defined and open-ended tasks. Extensive results demonstrate the effectiveness of our proposed GraphTranslator on zero-shot node classification. The graph question answering experiments reveal our GraphTranslator potential across a broad spectrum of open-ended tasks through language instructions. Our code is available at: https://github.com/alibaba/GraphTranslator.

Via

Access Paper or Ask Questions

Endowing Pre-trained Graph Models with Provable Fairness

Feb 20, 2024

Zhongjian Zhang, Mengmei Zhang, Yue Yu, Cheng Yang, Jiawei Liu, Chuan Shi

Figure 1 for Endowing Pre-trained Graph Models with Provable Fairness

Figure 2 for Endowing Pre-trained Graph Models with Provable Fairness

Figure 3 for Endowing Pre-trained Graph Models with Provable Fairness

Figure 4 for Endowing Pre-trained Graph Models with Provable Fairness

Abstract:Pre-trained graph models (PGMs) aim to capture transferable inherent structural properties and apply them to different downstream tasks. Similar to pre-trained language models, PGMs also inherit biases from human society, resulting in discriminatory behavior in downstream applications. The debiasing process of existing fair methods is generally coupled with parameter optimization of GNNs. However, different downstream tasks may be associated with different sensitive attributes in reality, directly employing existing methods to improve the fairness of PGMs is inflexible and inefficient. Moreover, most of them lack a theoretical guarantee, i.e., provable lower bounds on the fairness of model predictions, which directly provides assurance in a practical scenario. To overcome these limitations, we propose a novel adapter-tuning framework that endows pre-trained graph models with provable fairness (called GraphPAR). GraphPAR freezes the parameters of PGMs and trains a parameter-efficient adapter to flexibly improve the fairness of PGMs in downstream tasks. Specifically, we design a sensitive semantic augmenter on node representations, to extend the node representations with different sensitive attribute semantics for each node. The extended representations will be used to further train an adapter, to prevent the propagation of sensitive attribute semantics from PGMs to task predictions. Furthermore, with GraphPAR, we quantify whether the fairness of each node is provable, i.e., predictions are always fair within a certain range of sensitive attribute semantics. Experimental evaluations on real-world datasets demonstrate that GraphPAR achieves state-of-the-art prediction performance and fairness on node classification task. Furthermore, based on our GraphPAR, around 90\% nodes have provable fairness.

* Accepted by WWW 2024

Via

Access Paper or Ask Questions

Graph Fairness Learning under Distribution Shifts

Jan 30, 2024

Yibo Li, Xiao Wang, Yujie Xing, Shaohua Fan, Ruijia Wang, Yaoqi Liu, Chuan Shi

Figure 1 for Graph Fairness Learning under Distribution Shifts

Figure 2 for Graph Fairness Learning under Distribution Shifts

Figure 3 for Graph Fairness Learning under Distribution Shifts

Figure 4 for Graph Fairness Learning under Distribution Shifts

Abstract:Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are under the same distribution, i.e., training data and testing data are from the same graph. Will graph fairness performance decrease under distribution shifts? How does distribution shifts affect graph fairness learning? All these open questions are largely unexplored from a theoretical perspective. To answer these questions, we first theoretically identify the factors that determine bias on a graph. Subsequently, we explore the factors influencing fairness on testing graphs, with a noteworthy factor being the representation distances of certain groups between the training and testing graph. Motivated by our theoretical analysis, we propose our framework FatraGNN. Specifically, to guarantee fairness performance on unknown testing graphs, we propose a graph generator to produce numerous graphs with significant bias and under different distributions. Then we minimize the representation distances for each certain group between the training graph and generated graphs. This empowers our model to achieve high classification and fairness performance even on generated graphs with significant bias, thereby effectively handling unknown testing graphs. Experiments on real-world and semi-synthetic datasets demonstrate the effectiveness of our model in terms of both accuracy and fairness.

* Accepted by WWW 2024

Via

Access Paper or Ask Questions

Graph Contrastive Invariant Learning from the Causal Perspective

Jan 23, 2024

Yanhu Mo, Xiao Wang, Shaohua Fan, Chuan Shi

Figure 1 for Graph Contrastive Invariant Learning from the Causal Perspective

Figure 2 for Graph Contrastive Invariant Learning from the Causal Perspective

Figure 3 for Graph Contrastive Invariant Learning from the Causal Perspective

Figure 4 for Graph Contrastive Invariant Learning from the Causal Perspective

Abstract:Graph contrastive learning (GCL), learning the node representation by contrasting two augmented graphs in a self-supervised way, has attracted considerable attention. GCL is usually believed to learn the invariant representation. However, does this understanding always hold in practice? In this paper, we first study GCL from the perspective of causality. By analyzing GCL with the structural causal model (SCM), we discover that traditional GCL may not well learn the invariant representations due to the non-causal information contained in the graph. How can we fix it and encourage the current GCL to learn better invariant representations? The SCM offers two requirements and motives us to propose a novel GCL method. Particularly, we introduce the spectral graph augmentation to simulate the intervention upon non-causal factors. Then we design the invariance objective and independence objective to better capture the causal factors. Specifically, (i) the invariance objective encourages the encoder to capture the invariant information contained in causal variables, and (ii) the independence objective aims to reduce the influence of confounders on the causal variables. Experimental results demonstrate the effectiveness of our approach on node classification tasks.

Via

Access Paper or Ask Questions

Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Dec 18, 2023

Tianrui Jia, Haoyang Li, Cheng Yang, Tao Tao, Chuan Shi

Figure 1 for Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Figure 2 for Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Figure 3 for Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Figure 4 for Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Abstract:Graph neural networks (GNNs) have been demonstrated to perform well in graph representation learning, but always lacking in generalization capability when tackling out-of-distribution (OOD) data. Graph invariant learning methods, backed by the invariance principle among defined multiple environments, have shown effectiveness in dealing with this issue. However, existing methods heavily rely on well-predefined or accurately generated environment partitions, which are hard to be obtained in practice, leading to sub-optimal OOD generalization performances. In this paper, we propose a novel graph invariant learning method based on invariant and variant patterns co-mixup strategy, which is capable of jointly generating mixed multiple environments and capturing invariant patterns from the mixed graph data. Specifically, we first adopt a subgraph extractor to identify invariant subgraphs. Subsequently, we design one novel co-mixup strategy, i.e., jointly conducting environment Mixup and invariant Mixup. For the environment Mixup, we mix the variant environment-related subgraphs so as to generate sufficiently diverse multiple environments, which is important to guarantee the quality of the graph invariant learning. For the invariant Mixup, we mix the invariant subgraphs, further encouraging to capture invariant patterns behind graphs while getting rid of spurious correlations for OOD generalization. We demonstrate that the proposed environment Mixup and invariant Mixup can mutually promote each other. Extensive experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art under various distribution shifts.

* Has been accepted at the 38th AAAI Conference on Artificial Intelligence (AAAI-24)

Via

Access Paper or Ask Questions

A Generalized Neural Diffusion Framework on Graphs

Dec 14, 2023

Yibo Li, Xiao Wang, Hongrui Liu, Chuan Shi

Figure 1 for A Generalized Neural Diffusion Framework on Graphs

Figure 2 for A Generalized Neural Diffusion Framework on Graphs

Figure 3 for A Generalized Neural Diffusion Framework on Graphs

Figure 4 for A Generalized Neural Diffusion Framework on Graphs

Abstract:Recent studies reveal the connection between GNNs and the diffusion process, which motivates many diffusion-based GNNs to be proposed. However, since these two mechanisms are closely related, one fundamental question naturally arises: Is there a general diffusion framework that can formally unify these GNNs? The answer to this question can not only deepen our understanding of the learning process of GNNs, but also may open a new door to design a broad new class of GNNs. In this paper, we propose a general diffusion equation framework with the fidelity term, which formally establishes the relationship between the diffusion process with more GNNs. Meanwhile, with this framework, we identify one characteristic of graph diffusion networks, i.e., the current neural diffusion process only corresponds to the first-order diffusion equation. However, by an experimental investigation, we show that the labels of high-order neighbors actually exhibit monophily property, which induces the similarity based on labels among high-order neighbors without requiring the similarity among first-order neighbors. This discovery motives to design a new high-order neighbor-aware diffusion equation, and derive a new type of graph diffusion network (HiD-Net) based on the framework. With the high-order diffusion equation, HiD-Net is more robust against attacks and works on both homophily and heterophily graphs. We not only theoretically analyze the relation between HiD-Net with high-order random walk, but also provide a theoretical convergence guarantee. Extensive experimental results well demonstrate the effectiveness of HiD-Net over state-of-the-art graph diffusion networks.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions