Alert button
Picture for Xiaonan Nie

Xiaonan Nie

Alert button

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Add code
Bookmark button
Alert button
Jul 05, 2023
Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Xiaonan Nie, Bin Cui

Figure 1 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 2 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 3 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Figure 4 for Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Viaarxiv icon

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Add code
Bookmark button
Alert button
Apr 08, 2023
Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

Figure 1 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Figure 2 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Figure 3 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Figure 4 for FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement
Viaarxiv icon

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

Add code
Bookmark button
Alert button
Mar 06, 2023
Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

Figure 1 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Figure 2 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Figure 3 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Figure 4 for Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Viaarxiv icon

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Add code
Bookmark button
Alert button
Nov 25, 2022
Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui

Figure 1 for Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Figure 2 for Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Figure 3 for Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Figure 4 for Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Viaarxiv icon

Dense-to-Sparse Gate for Mixture-of-Experts

Add code
Bookmark button
Alert button
Dec 29, 2021
Xiaonan Nie, Shijie Cao, Xupeng Miao, Lingxiao Ma, Jilong Xue, Youshan Miao, Zichao Yang, Zhi Yang, Bin Cui

Figure 1 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 2 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 3 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 4 for Dense-to-Sparse Gate for Mixture-of-Experts
Viaarxiv icon

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Add code
Bookmark button
Alert button
Dec 14, 2021
Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, Bin Cui

Figure 1 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 2 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 3 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Figure 4 for HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework
Viaarxiv icon