Large language models (LLMs) have achieved impressive success across several fields, but their proficiency in understanding and resolving complex graph problems is less explored. To bridge this gap, we introduce GraphInstruct, a novel and comprehensive instruction-tuning dataset designed to equip language models with the ability to tackle a broad spectrum of graph problems using explicit reasoning paths. Utilizing GraphInstruct, we build GraphWiz, an open-source language model capable of resolving various graph problem types while generating clear reasoning processes. To enhance the model's capability and reliability, we incorporate the Direct Preference Optimization (DPO) framework into the graph problem-solving context. The enhanced model, GraphWiz-DPO, achieves an average accuracy of 65% across nine tasks with different complexity levels, surpassing GPT-4 which has an average accuracy of 43.8%. Moreover, our research delves into the delicate balance between training data volume and model performance, highlighting the potential for overfitting with increased data. We also explore the transferability of the model's reasoning ability across different graph tasks, indicating the model's adaptability and practical application potential. Our investigation offers a new blueprint and valuable insights for developing LLMs specialized in graph reasoning and problem-solving.
With a long history of traditional Graph Anomaly Detection (GAD) algorithms and recently popular Graph Neural Networks (GNNs), it is still not clear (1) how they perform under a standard comprehensive setting, (2) whether GNNs outperform traditional algorithms such as tree ensembles, and (3) their efficiency on large-scale graphs. In response, we present GADBench -- a comprehensive benchmark for supervised anomalous node detection on static graphs. GADBench provides a thorough comparison across 23 distinct models on ten real-world GAD datasets ranging from thousands to millions of nodes ($\sim$6M). Our main finding is that tree ensembles with simple neighborhood aggregation outperform all other baselines, including the latest GNNs tailored for the GAD task. By making GADBench available as an open-source tool, we offer pivotal insights into the current advancements of GAD and establish a solid foundation for future research. Our code is available at https://github.com/squareRoot3/GADBench.
Entity alignment is the task of identifying corresponding entities across different knowledge graphs (KGs). Although recent embedding-based entity alignment methods have shown significant advancements, they still struggle to fully utilize KG structural information. In this paper, we introduce FGWEA, an unsupervised entity alignment framework that leverages the Fused Gromov-Wasserstein (FGW) distance, allowing for a comprehensive comparison of entity semantics and KG structures within a joint optimization framework. To address the computational challenges associated with optimizing FGW, we devise a three-stage progressive optimization algorithm. It starts with a basic semantic embedding matching, proceeds to approximate cross-KG structural and relational similarity matching based on iterative updates of high-confidence entity links, and ultimately culminates in a global structural comparison between KGs. We perform extensive experiments on four entity alignment datasets covering 14 distinct KGs across five languages. Without any supervision or hyper-parameter tuning, FGWEA surpasses 21 competitive baselines, including cutting-edge supervised entity alignment methods. Our code is available at https://github.com/squareRoot3/FusedGW-Entity-Alignment.
Data imputation is a prevalent and important task due to the ubiquitousness of missing data. Many efforts try to first draft a completed data and second refine to derive the imputation results, or "draft-then-refine" for short. In this work, we analyze this widespread practice from the perspective of Dirichlet energy. We find that a rudimentary "draft" imputation will decrease the Dirichlet energy, thus an energy-maintenance "refine" step is in need to recover the overall energy. Since existing "refine" methods such as Graph Convolutional Network (GCN) tend to cause further energy decline, in this work, we propose a novel framework called Graph Laplacian Pyramid Network (GLPN) to preserve Dirichlet energy and improve imputation performance. GLPN consists of a U-shaped autoencoder and residual networks to capture global and local detailed information respectively. By extensive experiments on several real-world datasets, GLPN shows superior performance over state-of-the-art methods under three different missing mechanisms. Our source code is available at https://github.com/liguanlue/GLPN.
In this work, we present the Bregman Alternating Projected Gradient (BAPG) method, a single-loop algorithm that offers an approximate solution to the Gromov-Wasserstein (GW) distance. We introduce a novel relaxation technique that balances accuracy and computational efficiency, albeit with some compromises in the feasibility of the coupling map. Our analysis is based on the observation that the GW problem satisfies the Luo-Tseng error bound condition, which relates to estimating the distance of a point to the critical point set of the GW problem based on the optimality residual. This observation allows us to provide an approximation bound for the distance between the fixed-point set of BAPG and the critical point set of GW. Moreover, under a mild technical assumption, we can show that BAPG converges to its fixed point set. The effectiveness of BAPG has been validated through comprehensive numerical experiments in graph alignment and partition tasks, where it outperforms existing methods in terms of both solution quality and wall-clock time.
Graph alignment, which aims at identifying corresponding entities across multiple networks, has been widely applied in various domains. As the graphs to be aligned are usually constructed from different sources, the inconsistency issues of structures and features between two graphs are ubiquitous in real-world applications. Most existing methods follow the ``embed-then-cross-compare'' paradigm, which computes node embeddings in each graph and then processes node correspondences based on cross-graph embedding comparison. However, we find these methods are unstable and sub-optimal when structure or feature inconsistency appears. To this end, we propose SLOTAlign, an unsupervised graph alignment framework that jointly performs Structure Learning and Optimal Transport Alignment. We convert graph alignment to an optimal transport problem between two intra-graph matrices without the requirement of cross-graph comparison. We further incorporate multi-view structure learning to enhance graph representation power and reduce the effect of structure and feature inconsistency inherited across graphs. Moreover, an alternating scheme based algorithm has been developed to address the joint optimization problem in SLOTAlign, and the provable convergence result is also established. Finally, we conduct extensive experiments on six unsupervised graph alignment datasets and the DBP15K knowledge graph (KG) alignment benchmark dataset. The proposed SLOTAlign shows superior performance and strongest robustness over seven unsupervised graph alignment methods and five specialized KG alignment methods.
Mobile CrowdSensing (MCS), through employing considerable workers to sense and collect data in a participatory manner, has been recognized as a promising paradigm for building many large-scale applications in a cost-effective way, such as combating COVID-19. The recruitment of trustworthy and high-quality workers is an important research issue for MCS. Previous studies assume that the qualities of workers are known in advance, or the platform knows the qualities of workers once it receives their collected data. In reality, to reduce their costs and thus maximize revenue, many strategic workers do not perform their sensing tasks honestly and report fake data to the platform. So, it is very hard for the platform to evaluate the authenticity of the received data. In this paper, an incentive mechanism named Semi-supervision based Combinatorial Multi-Armed Bandit reverse Auction (SCMABA) is proposed to solve the recruitment problem of multiple unknown and strategic workers in MCS. First, we model the worker recruitment as a multi-armed bandit reverse auction problem, and design an UCB-based algorithm to separate the exploration and exploitation, considering the Sensing Rates (SRs) of recruited workers as the gain of the bandit. Next, a Semi-supervised Sensing Rate Learning (SSRL) approach is proposed to quickly and accurately obtain the workers' SRs, which consists of two phases, supervision and self-supervision. Last, SCMABA is designed organically combining the SRs acquisition mechanism with multi-armed bandit reverse auction, where supervised SR learning is used in the exploration, and the self-supervised one is used in the exploitation. We prove that our SCMABA achieves truthfulness and individual rationality. Additionally, we exhibit outstanding performances of the SCMABA mechanism through in-depth simulations of real-world data traces.
Graph neural networks (GNNs) are popular weapons for modeling relational data. Existing GNNs are not specified for attribute-incomplete graphs, making missing attribute imputation a burning issue. Until recently, many works notice that GNNs are coupled with spectral concentration, which means the spectrum obtained by GNNs concentrates on a local part in spectral domain, e.g., low-frequency due to oversmoothing issue. As a consequence, GNNs may be seriously flawed for reconstructing graph attributes as graph spectral concentration tends to cause a low imputation precision. In this work, we present a regularized graph autoencoder for graph attribute imputation, named MEGAE, which aims at mitigating spectral concentration problem by maximizing the graph spectral entropy. Notably, we first present the method for estimating graph spectral entropy without the eigen-decomposition of Laplacian matrix and provide the theoretical upper error bound. A maximum entropy regularization then acts in the latent space, which directly increases the graph spectral entropy. Extensive experiments show that MEGAE outperforms all the other state-of-the-art imputation methods on a variety of benchmark datasets.
Graph Neural Networks (GNNs) are widely applied for graph anomaly detection. As one of the key components for GNN design is to select a tailored spectral filter, we take the first step towards analyzing anomalies via the lens of the graph spectrum. Our crucial observation is the existence of anomalies will lead to the `right-shift' phenomenon, that is, the spectral energy distribution concentrates less on low frequencies and more on high frequencies. This fact motivates us to propose the Beta Wavelet Graph Neural Network (BWGNN). Indeed, BWGNN has spectral and spatial localized band-pass filters to better handle the `right-shift' phenomenon in anomalies. We demonstrate the effectiveness of BWGNN on four large-scale anomaly detection datasets. Our code and data are released at https://github.com/squareRoot3/Rethinking-Anomaly-Detection