Graph-structured data exhibits universality and widespread applicability across diverse domains, such as social network analysis, biochemistry, financial fraud detection, and network security. Significant strides have been made in leveraging Graph Neural Networks (GNNs) to achieve remarkable success in these areas. However, in real-world scenarios, the training environment for models is often far from ideal, leading to substantial performance degradation of GNN models due to various unfavorable factors, including imbalance in data distribution, the presence of noise in erroneous data, privacy protection of sensitive information, and generalization capability for out-of-distribution (OOD) scenarios. To tackle these issues, substantial efforts have been devoted to improving the performance of GNN models in practical real-world scenarios, as well as enhancing their reliability and robustness. In this paper, we present a comprehensive survey that systematically reviews existing GNN models, focusing on solutions to the four mentioned real-world challenges including imbalance, noise, privacy, and OOD in practical scenarios that many existing reviews have not considered. Specifically, we first highlight the four key challenges faced by existing GNNs, paving the way for our exploration of real-world GNN models. Subsequently, we provide detailed discussions on these four aspects, dissecting how these solutions contribute to enhancing the reliability and robustness of GNN models. Last but not least, we outline promising directions and offer future perspectives in the field.
We investigate node representation learning on text-attributed graphs (TAGs), where nodes are associated with text information. Although recent studies on graph neural networks (GNNs) and pretrained language models (PLMs) have exhibited their power in encoding network and text signals, respectively, less attention has been paid to delicately coupling these two types of models on TAGs. Specifically, existing GNNs rarely model text in each node in a contextualized way; existing PLMs can hardly be applied to characterize graph structures due to their sequence architecture. To address these challenges, we propose HASH-CODE, a High-frequency Aware Spectral Hierarchical Contrastive Selective Coding method that integrates GNNs and PLMs into a unified model. Different from previous "cascaded architectures" that directly add GNN layers upon a PLM, our HASH-CODE relies on five self-supervised optimization objectives to facilitate thorough mutual enhancement between network and text signals in diverse granularities. Moreover, we show that existing contrastive objective learns the low-frequency component of the augmentation graph and propose a high-frequency component (HFC)-aware contrastive learning objective that makes the learned embeddings more distinctive. Extensive experiments on six real-world benchmarks substantiate the efficacy of our proposed approach. In addition, theoretical analysis and item embedding visualization provide insights into our model interoperability.
Representing the information of multiple behaviors in the single graph collaborative filtering (CF) vector has been a long-standing challenge. This is because different behaviors naturally form separate behavior graphs and learn separate CF embeddings. Existing models merge the separate embeddings by appointing the CF embeddings for some behaviors as the primary embedding and utilizing other auxiliaries to enhance the primary embedding. However, this approach often results in the joint embedding performing well on the main tasks but poorly on the auxiliary ones. To address the problem arising from the separate behavior graphs, we propose the concept of Partial Order Graphs (POG). POG defines the partial order relation of multiple behaviors and models behavior combinations as weighted edges to merge separate behavior graphs into a joint POG. Theoretical proof verifies that POG can be generalized to any given set of multiple behaviors. Based on POG, we propose the tailored Partial Order Graph Convolutional Networks (POGCN) that convolute neighbors' information while considering the behavior relations between users and items. POGCN also introduces a partial-order BPR sampling strategy for efficient and effective multiple-behavior CF training. POGCN has been successfully deployed on the homepage of Alibaba for two months, providing recommendation services for over one billion users. Extensive offline experiments conducted on three public benchmark datasets demonstrate that POGCN outperforms state-of-the-art multi-behavior baselines across all types of behaviors. Furthermore, online A/B tests confirm the superiority of POGCN in billion-scale recommender systems.
Predicting Click-Through Rate (CTR) in billion-scale recommender systems poses a long-standing challenge for Graph Neural Networks (GNNs) due to the overwhelming computational complexity involved in aggregating billions of neighbors. To tackle this, GNN-based CTR models usually sample hundreds of neighbors out of the billions to facilitate efficient online recommendations. However, sampling only a small portion of neighbors results in a severe sampling bias and the failure to encompass the full spectrum of user or item behavioral patterns. To address this challenge, we name the conventional user-item recommendation graph as "micro recommendation graph" and introduce a more suitable MAcro Recommendation Graph (MAG) for billion-scale recommendations. MAG resolves the computational complexity problems in the infrastructure by reducing the node count from billions to hundreds. Specifically, MAG groups micro nodes (users and items) with similar behavior patterns to form macro nodes. Subsequently, we introduce tailored Macro Graph Neural Networks (MacGNN) to aggregate information on a macro level and revise the embeddings of macro nodes. MacGNN has already served Taobao's homepage feed for two months, providing recommendations for over one billion users. Extensive offline experiments on three public benchmark datasets and an industrial dataset present that MacGNN significantly outperforms twelve CTR baselines while remaining computationally efficient. Besides, online A/B tests confirm MacGNN's superiority in billion-scale recommender systems.
Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot reflect critical spatiotemporal local patterns, we design multi-filter convolution modules for generating informative ST-token embeddings to facilitate attention computation. Then, based on ST-token and spatial-temporal position encoding, we employ the Transformers to capture long-range temporal and spatial dependencies. Furthermore, we introduce structural entropy theory to optimize the spatial attention mechanism. Specifically, The structural entropy minimization algorithm is used to generate optimal road network hierarchies, i.e., encoding trees. Based on this, we propose a relative structural entropy-based position encoding and a multi-head attention masking scheme based on multi-layer encoding trees. Extensive experiments demonstrate the superiority of the presented framework over several state-of-the-art methods in real-world traffic datasets, and the longer historical windows are effectively utilized. The code is available at https://github.com/SELGroup/MultiSPANS.
POI recommendation is practically important to facilitate various Location-Based Social Network services, and has attracted rising research attention recently. Existing works generally assume the available POI check-ins reported by users are the ground-truth depiction of user behaviors. However, in real application scenarios, the check-in data can be rather unreliable due to both subjective and objective causes including positioning error and user privacy concerns, leading to significant negative impacts on the performance of the POI recommendation. To this end, we investigate a novel problem of robust POI recommendation by considering the uncertainty factors of the user check-ins, and proposes a Bayes-enhanced Multi-view Attention Network. Specifically, we construct personal POI transition graph, the semantic-based POI graph and distance-based POI graph to comprehensively model the dependencies among the POIs. As the personal POI transition graph is usually sparse and sensitive to noise, we design a Bayes-enhanced spatial dependency learning module for data augmentation from the local view. A Bayesian posterior guided graph augmentation approach is adopted to generate a new graph with collaborative signals to increase the data diversity. Then both the original and the augmented graphs are used for POI representation learning to counteract the data uncertainty issue. Next, the POI representations of the three view graphs are input into the proposed multi-view attention-based user preference learning module. By incorporating the semantic and distance correlations of POIs, the user preference can be effectively refined and finally robust recommendation results are achieved. The results of extensive experiments show that BayMAN significantly outperforms the state-of-the-art methods in POI recommendation when the available check-ins are incomplete and noisy.
Spatial time series imputation is critically important to many real applications such as intelligent transportation and air quality monitoring. Although recent transformer and diffusion model based approaches have achieved significant performance gains compared with conventional statistic based methods, spatial time series imputation still remains as a challenging issue due to the complex spatio-temporal dependencies and the noise uncertainty of the spatial time series data. Especially, recent diffusion process based models may introduce random noise to the imputations, and thus cause negative impact on the model performance. To this end, we propose a self-adaptive noise scaling diffusion model named SaSDim to more effectively perform spatial time series imputation. Specially, we propose a new loss function that can scale the noise to the similar intensity, and propose the across spatial-temporal global convolution module to more effectively capture the dynamic spatial-temporal dependencies. Extensive experiments conducted on three real world datasets verify the effectiveness of SaSDim by comparison with current state-of-the-art baselines.
Although Transformer has achieved great success in natural language process and computer vision, it has difficulty generalizing to medium and large-scale graph data for two important reasons: (i) High complexity. (ii) Failing to capture the complex and entangled structure information. In graph representation learning, Graph Neural Networks(GNNs) can fuse the graph structure and node attributes but have limited receptive fields. Therefore, we question whether can we combine Transformers and GNNs to help each other. In this paper, we propose a new model named TransGNN where the Transformer layer and GNN layer are used alternately to improve each other. Specifically, to expand the receptive field and disentangle the information aggregation from edges, we propose using Transformer to aggregate more relevant nodes' information to improve the message passing of GNNs. Besides, to capture the graph structure information, we utilize positional encoding and make use of the GNN layer to fuse the structure into node attributes, which improves the Transformer in graph data. We also propose to sample the most relevant nodes for Transformer and two efficient samples update strategies to lower the complexity. At last, we theoretically prove that TransGNN is more expressive than GNNs only with extra linear complexity. The experiments on eight datasets corroborate the effectiveness of TransGNN on node and graph classification tasks.
Many real-world graph learning tasks require handling dynamic graphs where new nodes and edges emerge. Dynamic graph learning methods commonly suffer from the catastrophic forgetting problem, where knowledge learned for previous graphs is overwritten by updates for new graphs. To alleviate the problem, continual graph learning methods are proposed. However, existing continual graph learning methods aim to learn new patterns and maintain old ones with the same set of parameters of fixed size, and thus face a fundamental tradeoff between both goals. In this paper, we propose Parameter Isolation GNN (PI-GNN) for continual learning on dynamic graphs that circumvents the tradeoff via parameter isolation and expansion. Our motivation lies in that different parameters contribute to learning different graph patterns. Based on the idea, we expand model parameters to continually learn emerging graph patterns. Meanwhile, to effectively preserve knowledge for unaffected patterns, we find parameters that correspond to them via optimization and freeze them to prevent them from being rewritten. Experiments on eight real-world datasets corroborate the effectiveness of PI-GNN compared to state-of-the-art baselines.
Graph structure learning aims to learn connectivity in a graph from data. It is particularly important for many computer vision related tasks since no explicit graph structure is available for images for most cases. A natural way to construct a graph among images is to treat each image as a node and assign pairwise image similarities as weights to corresponding edges. It is well known that pairwise similarities between images are sensitive to the noise in feature representations, leading to unreliable graph structures. We address this problem from the viewpoint of statistical tests. By viewing the feature vector of each node as an independent sample, the decision of whether creating an edge between two nodes based on their similarity in feature representation can be thought as a ${\it single}$ statistical test. To improve the robustness in the decision of creating an edge, multiple samples are drawn and integrated by ${\it multiple}$ statistical tests to generate a more reliable similarity measure, consequentially more reliable graph structure. The corresponding elegant matrix form named $\mathcal{B}\textbf{-Attention}$ is designed for efficiency. The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets. Source codes are available at https://github.com/Thomas-wyh/B-Attention.