Alert button
Picture for Samyam Rajbhandari

Samyam Rajbhandari

Alert button

Scalable and Efficient MoE Training for Multitask Multilingual Models

Add code
Bookmark button
Alert button
Sep 22, 2021
Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

Figure 1 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 2 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 3 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 4 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Viaarxiv icon

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

Add code
Bookmark button
Alert button
Apr 16, 2021
Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He

Figure 1 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 2 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 3 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 4 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Viaarxiv icon

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

Add code
Bookmark button
Alert button
Apr 13, 2021
Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He

Figure 1 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 2 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 3 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 4 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Viaarxiv icon

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Add code
Bookmark button
Alert button
Feb 04, 2021
Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

Figure 1 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 2 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 3 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 4 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Viaarxiv icon

ZeRO-Offload: Democratizing Billion-Scale Model Training

Add code
Bookmark button
Alert button
Jan 18, 2021
Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He

Figure 1 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 2 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 3 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 4 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Viaarxiv icon

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Add code
Bookmark button
Alert button
Aug 28, 2020
Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang

Figure 1 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 2 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 3 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Figure 4 for APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm
Viaarxiv icon

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

Add code
Bookmark button
Alert button
Oct 07, 2019
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He

Figure 1 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Figure 2 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Figure 3 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Figure 4 for ZeRO: Memory Optimization Towards Training A Trillion Parameter Models
Viaarxiv icon

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Add code
Bookmark button
Alert button
Oct 02, 2019
Samyam Rajbhandari, Harsh Shrivastava, Yuxiong He

Figure 1 for AntMan: Sparse Low-Rank Compression to Accelerate RNN inference
Figure 2 for AntMan: Sparse Low-Rank Compression to Accelerate RNN inference
Figure 3 for AntMan: Sparse Low-Rank Compression to Accelerate RNN inference
Figure 4 for AntMan: Sparse Low-Rank Compression to Accelerate RNN inference
Viaarxiv icon