Picture for Yuxiong He

Yuxiong He

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Jun 04, 2022
Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Jun 04, 2022
Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

Add code
Feb 12, 2022
Figure 1 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 2 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 3 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 4 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Feb 04, 2022
Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

Add code
Jan 29, 2022
Figure 1 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 2 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 3 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 4 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Jan 14, 2022
Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM

Add code
Oct 28, 2021
Figure 1 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Figure 2 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Figure 3 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Figure 4 for NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Viaarxiv icon

Scalable and Efficient MoE Training for Multitask Multilingual Models

Add code
Sep 22, 2021
Figure 1 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 2 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 3 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 4 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Viaarxiv icon

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training

Add code
Aug 13, 2021
Figure 1 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 2 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 3 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Figure 4 for Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training
Viaarxiv icon

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

Add code
Apr 16, 2021
Figure 1 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 2 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 3 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Figure 4 for ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Viaarxiv icon