Alert button

MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models

Feb 06, 2023
Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

Figure 1 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 2 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 3 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 4 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models

Share this with someone who'll enjoy it:

Recently, large models have achieved the state of the art performances in various fields. In order to support large model training, we have to use distributed training techniques. However, finding an efficient distributed execution plan not only requires fine-grained model statistics, such as memory and computing overhead of each operator but also is a labor-intensive task even for an expert in the field of distributed training. In this paper, we introduce MAP, a compiler built upon PyTorch to implement Memory-aware Automated Parallelization. To profiling operator costs, existing training systems and machine learning pipelines either physically execute with respect to each operand or estimate the memory usage with a scaled input tensor, which are often time-consuming and misleading. Compared with existing methods, MAP provides an easy-to-use symbolic profiler to generate memory and computing statistics of an arbitrary PyTorch model with trivial time cost, so it will boost high productivity for ML developers. In addition, MAP can also seamlessly speed up different static planning tasks on computation graphs for PyTorch, and requires only a few lines of modification to user code to generate a new module instance that has a top-performing distributed execution plan. The source code is publicly available at https://github.com/hpcaitech/ColossalAI

View paper onarxiv icon

Share this with someone who'll enjoy it: