Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrey Malevich

A Scalable and Effective Alternative to Graph Transformers

Jun 17, 2024

Kaan Sancak, Zhigang Hua, Jin Fang, Yan Xie, Andrey Malevich, Bo Long, Muhammed Fatih Balin, Ümit V. Çatalyürek

Figure 1 for A Scalable and Effective Alternative to Graph Transformers

Figure 2 for A Scalable and Effective Alternative to Graph Transformers

Figure 3 for A Scalable and Effective Alternative to Graph Transformers

Figure 4 for A Scalable and Effective Alternative to Graph Transformers

Abstract:Graph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasilinear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that GECO scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.

* Under submission

Via

Access Paper or Ask Questions

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Mar 24, 2024

Dongqi Fu, Zhigang Hua, Yan Xie, Jin Fang, Si Zhang, Kaan Sancak, Hao Wu, Andrey Malevich, Jingrui He, Bo Long

Figure 1 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Figure 2 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Figure 3 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Figure 4 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Abstract:Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR tokenization method decouples model training from complex graph topological information and makes heavy feature engineering offline and independent, such that mini-batch training of graph transformers is possible by loading each node's token list in batches. We further prove this PPR tokenization is viable as a graph convolution network with a fixed polynomial filter and jumping knowledge. However, only using personalized PageRank may limit information carried by a token list, which could not support different graph inductive biases for model training. To this end, (2) we rewire graphs by introducing multiple types of virtual connections through structure- and content-based super nodes that enable PPR tokenization to encode local and global contexts, long-range interaction, and heterophilous information into each node's token list, and then formalize our Virtual Connection Ranking based Graph Transformer (VCR-Graphormer).

Via

Access Paper or Ask Questions

Decoupling the Depth and Scope of Graph Neural Networks

Jan 19, 2022

Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, Ren Chen

Figure 1 for Decoupling the Depth and Scope of Graph Neural Networks

Figure 2 for Decoupling the Depth and Scope of Graph Neural Networks

Figure 3 for Decoupling the Depth and Scope of Graph Neural Networks

Abstract:State-of-the-art Graph Neural Networks (GNNs) have limited scalability with respect to the graph and model sizes. On large graphs, increasing the model depth often means exponential expansion of the scope (i.e., receptive field). Beyond just a few layers, two fundamental challenges emerge: 1. degraded expressivity due to oversmoothing, and 2. expensive computation due to neighborhood explosion. We propose a design principle to decouple the depth and scope of GNNs -- to generate representation of a target entity (i.e., a node or an edge), we first extract a localized subgraph as the bounded-size scope, and then apply a GNN of arbitrary depth on top of the subgraph. A properly extracted subgraph consists of a small number of critical neighbors, while excluding irrelevant ones. The GNN, no matter how deep it is, smooths the local neighborhood into informative representation rather than oversmoothing the global graph into "white noise". Theoretically, decoupling improves the GNN expressive power from the perspectives of graph signal processing (GCN), function approximation (GraphSAGE) and topological learning (GIN). Empirically, on seven graphs (with up to 110M nodes) and six backbone GNN architectures, our design achieves significant accuracy improvement with orders of magnitude reduction in computation and hardware cost.

* Advances in Neural Information Processing Systems, 2021
* Accepted to NeurIPS 2021

Via

Access Paper or Ask Questions

Deep Graph Neural Networks with Shallow Subgraph Samplers

Dec 02, 2020

Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Rajgopal Kannan, Viktor Prasanna, Long Jin, Andrey Malevich, Ren Chen

Figure 1 for Deep Graph Neural Networks with Shallow Subgraph Samplers

Figure 2 for Deep Graph Neural Networks with Shallow Subgraph Samplers

Figure 3 for Deep Graph Neural Networks with Shallow Subgraph Samplers

Figure 4 for Deep Graph Neural Networks with Shallow Subgraph Samplers

Abstract:While Graph Neural Networks (GNNs) are powerful models for learning representations on graphs, most state-of-the-art models do not have significant accuracy gain beyond two to three layers. Deep GNNs fundamentally need to address: 1). expressivity challenge due to oversmoothing, and 2). computation challenge due to neighborhood explosion. We propose a simple "deep GNN, shallow sampler" design principle to improve both the GNN accuracy and efficiency -- to generate representation of a target node, we use a deep GNN to pass messages only within a shallow, localized subgraph. A properly sampled subgraph may exclude irrelevant or even noisy nodes, and still preserve the critical neighbor features and graph structures. The deep GNN then smooths the informative local signals to enhance feature learning, rather than oversmoothing the global graph signals into just "white noise". We theoretically justify why the combination of deep GNNs with shallow samplers yields the best learning performance. We then propose various sampling algorithms and neural architecture extensions to achieve good empirical results. Experiments on five large graphs show that our models achieve significantly higher accuracy and efficiency, compared with state-of-the-art.

* 21 pages

Via

Access Paper or Ask Questions

Time-based Sequence Model for Personalization and Recommendation Systems

Aug 27, 2020

Tigran Ishkhanov, Maxim Naumov, Xianjie Chen, Yan Zhu, Yuan Zhong, Alisson Gusatti Azzolini, Chonglin Sun, Frank Jiang, Andrey Malevich, Liang Xiong

Figure 1 for Time-based Sequence Model for Personalization and Recommendation Systems

Figure 2 for Time-based Sequence Model for Personalization and Recommendation Systems

Figure 3 for Time-based Sequence Model for Personalization and Recommendation Systems

Figure 4 for Time-based Sequence Model for Personalization and Recommendation Systems

Abstract:In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study the properties of our state-of-the-art model on statistically designed data set. Also, we show that it outperforms more complex models with longer sequence length on the Taobao User Behavior dataset.

* 17 pages, 7 figures

Via

Access Paper or Ask Questions

Post-Training 4-bit Quantization on Embedding Tables

Nov 05, 2019

Hui Guan, Andrey Malevich, Jiyan Yang, Jongsoo Park, Hector Yuen

Figure 1 for Post-Training 4-bit Quantization on Embedding Tables

Figure 2 for Post-Training 4-bit Quantization on Embedding Tables

Figure 3 for Post-Training 4-bit Quantization on Embedding Tables

Figure 4 for Post-Training 4-bit Quantization on Embedding Tables

Abstract:Continuous representations have been widely adopted in recommender systems where a large number of entities are represented using embedding vectors. As the cardinality of the entities increases, the embedding components can easily contain millions of parameters and become the bottleneck in both storage and inference due to large memory consumption. This work focuses on post-training 4-bit quantization on the continuous embeddings. We propose row-wise uniform quantization with greedy search and codebook-based quantization that consistently outperforms state-of-the-art quantization approaches on reducing accuracy degradation. We deploy our uniform quantization technique on a production model in Facebook and demonstrate that it can reduce the model size to only 13.89% of the single-precision version while the model quality stays neutral.

* Accepted in MLSys@NeurIPS'19 (http://learningsys.org/neurips19/)

Via

Access Paper or Ask Questions

The Architectural Implications of Facebook's DNN-based Personalized Recommendation

Jun 18, 2019

Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee(+5 more)

Figure 1 for The Architectural Implications of Facebook's DNN-based Personalized Recommendation

Figure 2 for The Architectural Implications of Facebook's DNN-based Personalized Recommendation

Figure 3 for The Architectural Implications of Facebook's DNN-based Personalized Recommendation

Figure 4 for The Architectural Implications of Facebook's DNN-based Personalized Recommendation

Abstract:The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.

* 11 pages

Via

Access Paper or Ask Questions

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Nov 29, 2018

Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur(+18 more)

Figure 1 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Figure 2 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Figure 3 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Figure 4 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Abstract:The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight the need for better co-design of algorithms, numerics and computing platforms to address the challenges of workloads often run in data centers.

Via

Access Paper or Ask Questions