Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chien-chin Huang

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Jul 24, 2018

Minjie Wang, Chien-chin Huang, Jinyang Li

Figure 1 for Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Figure 2 for Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Figure 3 for Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Figure 4 for Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Abstract:There is a trend towards using very large deep neural networks (DNN) to improve the accuracy of complex machine learning tasks. However, the size of DNN models that can be explored today is limited by the amount of GPU device memory. This paper presents Tofu, a system for partitioning very large DNN models across multiple GPU devices. Tofu is designed for a tensor-based dataflow system: for each operator in the dataflow graph, it partitions its input/output tensors and parallelizes its execution across workers. Tofu can automatically discover how each operator can be partitioned by analyzing its semantics expressed in a simple specification language. Tofu uses a search algorithm based on dynamic programming to determine the best partition strategy for each operator in the entire dataflow graph. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves better performance than alternative approaches to train very large models on multiple GPUs.

Via

Access Paper or Ask Questions

Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

May 10, 2018

Minjie Wang, Chien-chin Huang, Jinyang Li

Figure 1 for Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

Figure 2 for Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

Figure 3 for Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

Figure 4 for Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling

Abstract:Deep learning systems have become vital tools across many fields, but the increasing model sizes mean that training must be accelerated to maintain such systems' utility. Current systems like Tensorflow and MXNet focus on one specific parallelization strategy, data parallelism, which requires large training batch sizes in order to scale. We cast the problem of finding the best parallelization strategy as the problem of finding the best tiling to partition tensors with the least overall communication. We propose an algorithm that can find the optimal tiling. Our resulting parallelization solution is a hybrid of data parallelism and model parallelism. We build the SoyBean system that performs automatic parallelization. SoyBean automatically transforms a serial dataflow graph captured by an existing deep learning system frontend into a parallel dataflow graph based on the optimal tiling it has found. Our evaluations show that SoyBean is 1.5x-4x faster than pure data parallelism for AlexNet and VGG. We present this automatic tiling in a new system, SoyBean, that can act as a backend for Tensorflow, MXNet, and others.

Via

Access Paper or Ask Questions