Picture for Conglong Li

Conglong Li

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Jan 14, 2022
Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training

Add code
Aug 13, 2021
Figure 1 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 2 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 3 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 4 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Viaarxiv icon

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

Add code
Apr 13, 2021
Figure 1 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 2 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 3 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Figure 4 for 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed
Viaarxiv icon

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Add code
Feb 04, 2021
Figure 1 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 2 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 3 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Figure 4 for 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Viaarxiv icon

Scaling Video Analytics on Constrained Edge Nodes

Add code
May 24, 2019
Figure 1 for Scaling Video Analytics on Constrained Edge Nodes
Figure 2 for Scaling Video Analytics on Constrained Edge Nodes
Figure 3 for Scaling Video Analytics on Constrained Edge Nodes
Figure 4 for Scaling Video Analytics on Constrained Edge Nodes
Viaarxiv icon