In the past 20 years, artificial neural networks have become dominant in various areas, continually growing in scale. However, the current analysis of large models has mainly focused on functionality, overlooking the influence of scale differences on their properties. To address this, we propose the concept of Emergence Learning, which emphasizes the significance of scale. By studying models of different scales, we have identified a key factor in achieving higher performance in large models: the decrease of monosemantic neurons. Building on this insight, we propose a proactive approach to inhibit monosemanticity for improved performance. Our solution involves a two-phase process that includes monosemantic neuron detection and inhibition, supported by theoretical analysis. Experimental results on various tasks and neural networks demonstrate the effectiveness of our proposed method. Following the idea of Emergence Learning, though drawing inspiration from scaling phenomena, the applicability of our method is not restricted to large scale alone. Therefore, the experiment is self-contained. However, extending this research to very large-scale datasets is appealing yet impossible for research departments due to limited resources. We are delighted to share the first co-authorship and eagerly await collaboration from any AI company before submission.
Recently, graph neural networks (GNNs) have shown its unprecedented success in many graph-related tasks. However, GNNs face the label scarcity issue as other neural networks do. Thus, recent efforts try to pre-train GNNs on a large-scale unlabeled graph and adapt the knowledge from the unlabeled graph to the target downstream task. The adaptation is generally achieved by fine-tuning the pre-trained GNNs with a limited number of labeled data. Despite the importance of fine-tuning, current GNNs pre-training works often ignore designing a good fine-tuning strategy to better leverage transferred knowledge and improve the performance on downstream tasks. Only few works start to investigate a better fine-tuning strategy for pre-trained GNNs. But their designs either have strong assumptions or overlook the data-aware issue for various downstream datasets. Therefore, we aim to design a better fine-tuning strategy for pre-trained GNNs to improve the model performance in this paper. Given a pre-trained GNN, we propose to search to fine-tune pre-trained graph neural networks for graph-level tasks (S2PGNN), which adaptively design a suitable fine-tuning framework for the given labeled data on the downstream task. To ensure the improvement brought by searching fine-tuning strategy, we carefully summarize a proper search space of fine-tuning framework that is suitable for GNNs. The empirical studies show that S2PGNN can be implemented on the top of 10 famous pre-trained GNNs and consistently improve their performance. Besides, S2PGNN achieves better performance than existing fine-tuning strategies within and outside the GNN area. Our code is publicly available at \url{https://anonymous.4open.science/r/code_icde2024-A9CB/}.
Recently, Graph Neural Networks (GNNs) have gained popularity in a variety of real-world scenarios. Despite the great success, the architecture design of GNNs heavily relies on manual labor. Thus, automated graph neural network (AutoGNN) has attracted interest and attention from the research community, which makes significant performance improvements in recent years. However, existing AutoGNN works mainly adopt an implicit way to model and leverage the link information in the graphs, which is not well regularized to the link prediction task on graphs, and limits the performance of AutoGNN for other graph tasks. In this paper, we present a novel AutoGNN work that explicitly models the link information, abbreviated to AutoGEL. In such a way, AutoGEL can handle the link prediction task and improve the performance of AutoGNNs on the node classification and graph classification task. Specifically, AutoGEL proposes a novel search space containing various design dimensions at both intra-layer and inter-layer designs and adopts a more robust differentiable search algorithm to further improve efficiency and effectiveness. Experimental results on benchmark data sets demonstrate the superiority of AutoGEL on several tasks.
The scoring function, which measures the plausibility of triplets in knowledge graphs (KGs), is the key to ensure the excellent performance of KG embedding, and its design is also an important problem in the literature. Automated machine learning (AutoML) techniques have recently been introduced into KG to design task-aware scoring functions, which achieve state-of-the-art performance in KG embedding. However, the effectiveness of searched scoring functions is still not as good as desired. In this paper, observing that existing scoring functions can exhibit distinct performance on different semantic patterns, we are motivated to explore such semantics by searching relation-aware scoring functions. But the relation-aware search requires a much larger search space than the previous one. Hence, we propose to encode the space as a supernet and propose an efficient alternative minimization algorithm to search through the supernet in a one-shot manner. Finally, experimental results on benchmark datasets demonstrate that the proposed method can efficiently search relation-aware scoring functions, and achieve better embedding performance than state-of-the-art methods.
Tensor, an extension of the vector and matrix to the multi-dimensional case, is a natural way to describe the N-ary relational data. Recently, tensor decomposition methods have been introduced into N-ary relational data and become state-of-the-art on embedding learning. However, the performance of existing tensor decomposition methods is not as good as desired. First, they suffer from the data-sparsity issue since they can only learn from the N-ary relational data with a specific arity, i.e., parts of common N-ary relational data. Besides, they are neither effective nor efficient enough to be trained due to the over-parameterization problem. In this paper, we propose a novel method, i.e., S2S, for effectively and efficiently learning from the N-ary relational data. Specifically, we propose a new tensor decomposition framework, which allows embedding sharing to learn from facts with mixed arity. Since the core tensors may still suffer from the over-parameterization, we propose to reduce parameters by sparsifying the core tensors while retaining their expressive power using neural architecture search (NAS) techniques, which can search for data-dependent architectures. As a result, the proposed S2S not only guarantees to be expressive but also efficiently learns from mixed arity. Finally, empirical results have demonstrated that S2S is efficient to train and achieves state-of-the-art performance.