Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, Chao Chen, Yusu Wang

Cycles are fundamental elements in graph-structured data and have demonstrated their effectiveness in enhancing graph learning models. To encode such information into a graph learning framework, prior works often extract a summary quantity, ranging from the number of cycles to the more sophisticated persistence diagram summaries. However, more detailed information, such as which edges are encoded in a cycle, has not yet been used in graph neural networks. In this paper, we make one step towards addressing this gap, and propose a structure encoding module, called CycleNet, that encodes cycle information via edge structure encoding in a permutation invariant manner. To efficiently encode the space of all cycles, we start with a cycle basis (i.e., a minimal set of cycles generating the cycle space) which we compute via the kernel of the 1-dimensional Hodge Laplacian of the input graph. To guarantee the encoding is invariant w.r.t. the choice of cycle basis, we encode the cycle information via the orthogonal projector of the cycle basis, which is inspired by BasisNet proposed by Lim et al. We also develop a more efficient variant which however requires that the input graph has a unique shortest cycle basis. To demonstrate the effectiveness of the proposed module, we provide some theoretical understandings of its expressive power. Moreover, we show via a range of experiments that networks enhanced by our CycleNet module perform better in various benchmarks compared to several existing SOTA models.

Via

Puoya Tabaghi, Yusu Wang

A main object of our study is multiset functions -- that is, permutation-invariant functions over inputs of varying sizes. Deep Sets, proposed by \cite{zaheer2017deep}, provides a \emph{universal representation} for continuous multiset functions on scalars via a sum-decomposable model. Restricting the domain of the functions to finite multisets of $D$-dimensional vectors, Deep Sets also provides a \emph{universal approximation} that requires a latent space dimension of $O(N^D)$ -- where $N$ is an upper bound on the size of input multisets. In this paper, we strengthen this result by proving that universal representation is guaranteed for continuous and discontinuous multiset functions though a latent space dimension of $O(N^D)$. We then introduce \emph{identifiable} multisets for which we can uniquely label their elements using an identifier function, namely, finite-precision vectors are identifiable. Using our analysis on identifiable multisets, we prove that a sum-decomposable model for general continuous multiset functions only requires a latent dimension of $2DN$. We further show that both encoder and decoder functions of the model are continuous -- our main contribution to the existing work which lack such a guarantee. Also this provides a significant improvement over the aforementioned $O(N^D)$ bound which was derived for universal representation of continuous and discontinuous multiset functions. We then extend our results and provide special sum-decomposition structures to universally represent permutation-invariant tensor functions on identifiable tensors. These families of sum-decomposition models enables us to design deep network architectures and deploy them on a variety of learning tasks on sequences, images, and graphs.

Via

Samantha Chen, Yusu Wang

Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).

Via

Tristan Brugère, Zhengchao Wan, Yusu Wang

(Directed) graphs with node attributes are a common type of data in various applications and there is a vast literature on developing metrics and efficient algorithms for comparing them. Recently, in the graph learning and optimization communities, a range of new approaches have been developed for comparing graphs with node attributes, leveraging ideas such as the Optimal Transport (OT) and the Weisfeiler-Lehman (WL) graph isomorphism test. Two state-of-the-art representatives are the OTC distance proposed by O'Connor et al., 2022 and the WL distance by Chen et al.,2022. Interestingly, while these two distances are developed based on different ideas, we observe that they both view graphs as Markov chains, and are deeply connected. Indeed, in this paper, we propose a unified framework to generate distances for Markov chains (thus including (directed) graphs with node attributes), which we call the Optimal Transport Markov (OTM) distances, that encompass both the OTC and the WL distances. We further introduce a special one-parameter family of distances within our OTM framework, called the discounted WL distance. We show that the discounted WL distance has nice theoretical properties and can address several limitations of the existing OTC and WL distances. Furthermore, contrary to the OTC and the WL distances, we show our new discounted WL distance can be differentiated (after an entropy-regularization similar to the Sinkhorn distance), making it suitable for use in learning frameworks, e.g., as the reconstruction loss in a graph generative model.

Via

Mitchell Black, Amir Nayyeri, Zhengchao Wan, Yusu Wang

Message passing graph neural networks are popular learning architectures for graph-structured data. However, it can be challenging for them to capture long range interactions in graphs. One of the potential reasons is the so-called oversquashing problem, first termed in [Alon and Yahav, 2020], that has recently received significant attention. In this paper, we analyze the oversquashing problem through the lens of effective resistance between nodes in the input graphs. The concept of effective resistance intuitively captures the "strength" of connection between two nodes by paths in the graph, and has a rich literature connecting spectral graph theory and circuit networks theory. We propose the use the concept of total effective resistance as a measure to quantify the total amount of oversquashing in a graph, and provide theoretical justification of its use. We further develop algorithms to identify edges to be added to an input graph so as to minimize the total effective resistance, thereby alleviating the oversquashing problem when using GNNs. We provide empirical evidence of the effectiveness of our total effective resistance based rewiring strategies.

Via

Samantha Chen, Sunhyuk Lim, Facundo Mémoli, Zhengchao Wan, Yusu Wang

In this paper, we present a novel interpretation of the so-called Weisfeiler-Lehman (WL) distance, introduced by Chen et al. (2022), using concepts from stochastic processes. The WL distance aims at comparing graphs with node features, has the same discriminative power as the classic Weisfeiler-Lehman graph isomorphism test and has deep connections to the Gromov-Wasserstein distance. This new interpretation connects the WL distance to the literature on distances for stochastic processes, which also makes the interpretation of the distance more accessible and intuitive. We further explore the connections between the WL distance and certain Message Passing Neural Networks, and discuss the implications of the WL distance for understanding the Lipschitz property and the universal approximation results for these networks.

Via

Chen Cai, Truong Son Hy, Rose Yu, Yusu Wang

Graph Transformer (GT) recently has emerged as a new paradigm of graph learning algorithms, outperforming the previously popular Message Passing Neural Network (MPNN) on multiple benchmarks. Previous work (Kim et al., 2022) shows that with proper position embedding, GT can approximate MPNN arbitrarily well, implying that GT is at least as powerful as MPNN. In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT. In particular, we first show that if we consider one type of linear transformer, the so-called Performer/Linear Transformer (Choromanski et al., 2020; Katharopoulos et al., 2020), then MPNN + VN with only O(1) depth and O(1) width can approximate a self-attention layer in Performer/Linear Transformer. Next, via a connection between MPNN + VN and DeepSets, we prove the MPNN + VN with O(n^d) width and O(1) depth can approximate the self-attention layer arbitrarily well, where d is the input feature dimension. Lastly, under some assumptions, we provide an explicit construction of MPNN + VN with O(1) width and O(n) depth approximating the self-attention layer in GT arbitrarily well. On the empirical side, we demonstrate that 1) MPNN + VN is a surprisingly strong baseline, outperforming GT on the recently proposed Long Range Graph Benchmark (LRGB) dataset, 2) our MPNN + VN improves over early implementation on a wide range of OGB datasets and 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task.

Via

Puoya Tabaghi, Michael Khanzadeh, Yusu Wang, Sivash Mirarab

Principal component analysis (PCA) is a workhorse of modern data science. Practitioners typically perform PCA assuming the data conforms to Euclidean geometry. However, for specific data types, such as hierarchical data, other geometrical spaces may be more appropriate. We study PCA in space forms; that is, those with constant positive (spherical) and negative (hyperbolic) curvatures, in addition to zero-curvature (Euclidean) spaces. At any point on a Riemannian manifold, one can define a Riemannian affine subspace based on a set of tangent vectors and use invertible maps to project tangent vectors to the manifold and vice versa. Finding a low-dimensional Riemannian affine subspace for a set of points in a space form amounts to dimensionality reduction because, as we show, any such affine subspace is isometric to a space form of the same dimension and curvature. To find principal components, we seek a (Riemannian) affine subspace that best represents a set of manifold-valued data points with the minimum average cost of projecting data points onto the affine subspace. We propose specific cost functions that bring about two major benefits: (1) the affine subspace can be estimated by solving an eigenequation -- similar to that of Euclidean PCA, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. Specifically for hyperbolic PCA, the associated eigenequation operates in the Lorentzian space, endowed with an indefinite inner product; we thus establish a connection between Lorentzian and Euclidean eigenequations. We evaluate the proposed space form PCA on data sets simulated in spherical and hyperbolic spaces and show that it outperforms alternative methods in convergence speed or accuracy, often both.

Via