Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

See Hian Lee, Feng Ji, Kelin Xia, Wee Peng Tay

Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously absent in the observed graph, we introduce latent variables to parameterize and generate multiple graphs. We obtain the maximum likelihood estimate of the network parameters in an Expectation-Maximization (EM) framework based on the multiple graphs. Specifically, we iteratively determine the distribution of the graphs using a Markov Chain Monte Carlo (MCMC) method, incorporating the principles of PAC-Bayesian theory. Numerical experiments demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.

Via

Xingchao Jian, Feng Ji, Wee Peng Tay

This correspondence points out a technical error in Proposition 4 of the paper [1]. Because of this error, the proofs of Lemma 3, Theorem 1, Theorem 3, Proposition 2, and Theorem 4 in that paper are no longer valid. We provide counterexamples to Proposition 4 and discuss where the flaw in its proof lies. We also provide numerical evidence indicating that Lemma 3, Theorem 1, and Proposition 2 are likely to be false. Since the proof of Theorem 4 depends on the validity of Proposition 4, we propose an amendment to the statement of Theorem 4 of the paper using convergence in operator norm and prove this rigorously. In addition, we also provide a construction that guarantees convergence in the sense of Proposition 4.

Via

Purui Zhang, Xingchao Jian, Feng Ji, Wee Peng Tay, Bihan Wen

Topological Signal Processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence. Inspired by the integral operator form of graphon shift operators, we construct a marginal complexon and complexon shift operator (CSO) according to components of all possible dimensions from the complexon. We investigate the CSO's eigenvalues and eigenvectors, and relate them to a new family of weighted adjacency matrices. We prove that when a simplicial complex sequence converges to a complexon, the eigenvalues of the corresponding CSOs converge to that of the limit complexon. This conclusion is further verified by a numerical experiment. These results hint at learning transferability on large simplicial complexes or simplicial complex sequences, which generalize the graphon signal processing framework.

Via

Purui Zhang, Xingchao Jian, Feng Ji, Wee Peng Tay, Bihan Wen

Topological signal processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence [1]. Inspired by the integral operator form of graphon shift operators, we construct a marginal complexon and complexon shift operator (CSO) according to components of all possible dimensions from the complexon. We investigate the CSO's eigenvalues and eigenvectors, and relate them to a new family of weighted adjacency matrices. We prove that when a simplicial complex sequence converges to a complexon, the eigenvalues of the corresponding CSOs converge to that of the limit complexon. These results hint at learning transferability on large simplicial complexes or simplicial complex sequences, which generalize the graphon signal processing framework.

Via

Xingchao Jian, Feng Ji, Wee Peng Tay

Graphons have traditionally served as limit objects for dense graph sequences, with the cut distance serving as the metric for convergence. However, sparse graph sequences converge to the trivial graphon under the conventional definition of cut distance, which make this framework inadequate for many practical applications. In this paper, we utilize the concepts of generalized graphons and stretched cut distance to describe the convergence of sparse graph sequences. Specifically, we consider a random graph process generated from a generalized graphon. This random graph process converges to the generalized graphon in stretched cut distance. We use this random graph process to model the growing sparse graph, and prove the convergence of the adjacency matrices' eigenvalues. We supplement our findings with experimental validation. Our results indicate the possibility of transfer learning between sparse graphs.

Via

Rui Ding, Jielong Yang, Feng Ji, Xionghu Zhong, Linbo Xie

Due to inappropriate sample selection and limited training data, a distribution shift often exists between the training and test sets. This shift can adversely affect the test performance of Graph Neural Networks (GNNs). Existing approaches mitigate this issue by either enhancing the robustness of GNNs to distribution shift or reducing the shift itself. However, both approaches necessitate retraining the model, which becomes unfeasible when the model structure and parameters are inaccessible. To address this challenge, we propose FR-GNN, a general framework for GNNs to conduct feature reconstruction. FRGNN constructs a mapping relationship between the output and input of a well-trained GNN to obtain class representative embeddings and then uses these embeddings to reconstruct the features of labeled nodes. These reconstructed features are then incorporated into the message passing mechanism of GNNs to influence the predictions of unlabeled nodes at test time. Notably, the reconstructed node features can be directly utilized for testing the well-trained model, effectively reducing the distribution shift and leading to improved test performance. This remarkable achievement is attained without any modifications to the model structure or parameters. We provide theoretical guarantees for the effectiveness of our framework. Furthermore, we conduct comprehensive experiments on various public datasets. The experimental results demonstrate the superior performance of FRGNN in comparison to mainstream methods.

Via

Feng Ji, Wee Peng Tay, Antonio Ortega

In this expository article, we provide a self-contained overview of the notion of convolution embedded in different theories: from the classical Fourier theory to the theory of algebraic signal processing. We discuss their relations and differences. Toward the end, we provide an opinion on whether there is a consistent approach to convolution that unifies seemingly different approaches by different theories.

Via

Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang, Yin Zhang

Knowledge distillation is of key importance to launching multilingual pre-trained language models for real applications. To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student. We first introduce an adaptive learning strategy and teacher importance weight, which enables a student to effectively learn from max-margin teachers and easily adapt to new languages. Moreover, we present a shared student encoder with different projection layers in support of multiple languages, which contributes to largely reducing development and machine cost. Experimental results show that AMTSS gains competitive results on the public XNLI dataset and the realistic industrial dataset AliExpress (AE) in the E-commerce scenario.

Via

Xingchao Jian, Feng Ji, Wee Peng Tay

Topological signal processing (TSP) over simplicial complexes typically assumes observations associated with the simplicial complexes are real scalars. In this paper, we develop TSP theories for the case where observations belong to abelian groups more general than real numbers, including function spaces that are commonly used to represent time-varying signals. Our approach generalizes the Hodge decomposition and allows for signal processing tasks to be performed on these more complex observations. We propose a unified and flexible framework for TSP that expands its applicability to a wider range of signal processing applications. Numerical results demonstrate the effectiveness of this approach and provide a foundation for future research in this area.

Via