Signed Graph Neural Networks (SGNNs) are vital for analyzing complex patterns in real-world signed graphs containing positive and negative links. However, three key challenges hinder current SGNN-based signed graph representation learning: sparsity in signed graphs leaves latent structures undiscovered, unbalanced triangles pose representation difficulties for SGNN models, and real-world signed graph datasets often lack supplementary information like node labels and features. These constraints limit the potential of SGNN-based representation learning. We address these issues with data augmentation techniques. Despite many graph data augmentation methods existing for unsigned graphs, none are tailored for signed graphs. Our paper introduces the novel Signed Graph Augmentation framework (SGA), comprising three main components. First, we employ the SGNN model to encode the signed graph, extracting latent structural information for candidate augmentation structures. Second, we evaluate these candidate samples (edges) and select the most beneficial ones for modifying the original training set. Third, we propose a novel augmentation perspective that assigns varying training difficulty to training samples, enabling the design of a new training strategy. Extensive experiments on six real-world datasets (Bitcoin-alpha, Bitcoin-otc, Epinions, Slashdot, Wiki-elec, and Wiki-RfA) demonstrate that SGA significantly improves performance across multiple benchmarks. Our method outperforms baselines by up to 22.2% in AUC for SGCN on Wiki-RfA, 33.3% in F1-binary, 48.8% in F1-micro, and 36.3% in F1-macro for GAT on Bitcoin-alpha in link sign prediction.
We introduce CLUE, a Chinese Language Understanding Evaluation benchmark. It contains eight different tasks, including single-sentence classification, sentence pair classification, and machine reading comprehension. We evaluate CLUE on a number of existing full-network pre-trained models for Chinese. We also include a small hand-crafted diagnostic test set designed to probe specific linguistic phenomena using different models, some of which are unique to Chinese. Along with CLUE, we release a large clean crawled raw text corpus that can be used for model pre-training. We release CLUE, baselines and pre-training dataset on Github.