Alert button
Picture for Yuxiong He

Yuxiong He

Alert button

Scalable and Efficient MoE Training for Multitask Multilingual Models

Add code
Bookmark button
Alert button
Sep 22, 2021
Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

Figure 1 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 2 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 3 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 4 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Viaarxiv icon

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training

Add code
Bookmark button
Alert button
Aug 13, 2021
Conglong Li, Minjia Zhang, Yuxiong He

Figure 1 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 2 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 3 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 4 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Viaarxiv icon

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

Add code
Bookmark button
Alert button
Apr 16, 2021
Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He

Figure 1 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 2 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 3 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 4 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Viaarxiv icon

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

Add code
Bookmark button
Alert button
Apr 13, 2021
Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He

Figure 1 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 2 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 3 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 4 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Viaarxiv icon

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Add code
Bookmark button
Alert button
Feb 04, 2021
Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

Figure 1 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 2 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 3 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 4 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Viaarxiv icon

ZeRO-Offload: Democratizing Billion-Scale Model Training

Add code
Bookmark button
Alert button
Jan 18, 2021
Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He

Figure 1 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 2 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 3 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 4 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Viaarxiv icon

Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Add code
Bookmark button
Alert button
Oct 26, 2020
Minjia Zhang, Yuxiong He

Figure 1 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Figure 2 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Figure 3 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Figure 4 for Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Viaarxiv icon

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Add code
Bookmark button
Alert button
Aug 28, 2020
Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang

Figure 1 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 2 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 3 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 4 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Viaarxiv icon

LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory

Add code
Bookmark button
Alert button
Nov 04, 2019
Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, Jose-Maria Arnau, Antonio Gonzalez

Figure 1 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Figure 2 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Figure 3 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Figure 4 for LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory
Viaarxiv icon

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

Add code
Bookmark button
Alert button
Oct 07, 2019
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He

Figure 1 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Figure 2 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Figure 3 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Figure 4 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Viaarxiv icon