Alert button
Picture for Christopher Lott

Christopher Lott

Alert button

Moccasin: Efficient Tensor Rematerialization for Neural Networks

Apr 27, 2023
Burak Bartan, Haoming Li, Harris Teague, Christopher Lott, Bistra Dilkina

Figure 1 for Moccasin: Efficient Tensor Rematerialization for Neural Networks
Figure 2 for Moccasin: Efficient Tensor Rematerialization for Neural Networks
Figure 3 for Moccasin: Efficient Tensor Rematerialization for Neural Networks
Figure 4 for Moccasin: Efficient Tensor Rematerialization for Neural Networks

The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called \textsc{Moccasin} with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.

Viaarxiv icon

Neural Topological Ordering for Computation Graphs

Jul 13, 2022
Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke Van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan

Figure 1 for Neural Topological Ordering for Computation Graphs
Figure 2 for Neural Topological Ordering for Computation Graphs
Figure 3 for Neural Topological Ordering for Computation Graphs
Figure 4 for Neural Topological Ordering for Computation Graphs

Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning based approach for topological ordering using an encoder-decoder framework. Our encoder is a novel attention based graph neural network architecture called \emph{Topoformer} which uses different topological transforms of a DAG for message passing. The node embeddings produced by the encoder are converted into node priorities which are used by the decoder to generate a probability distribution over topological orders. We train our model on a dataset of synthetically generated graphs called layered graphs. We show that our model outperforms, or is on-par, with several topological ordering baselines while being significantly faster on synthetic graphs with up to 2k nodes. We also train and test our model on a set of real-world computation graphs, showing performance improvements.

Viaarxiv icon