Picture for Yuxiong He

Yuxiong He

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Add code
Dec 07, 2022
Figure 1 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 2 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 3 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 4 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Viaarxiv icon

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Add code
Nov 17, 2022
Figure 1 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 2 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 3 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 4 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Viaarxiv icon

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Add code
Jul 29, 2022
Figure 1 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 2 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 3 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 4 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Viaarxiv icon

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Add code
Jun 30, 2022
Figure 1 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 2 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 3 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 4 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Viaarxiv icon

Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding

Add code
Jun 30, 2022
Figure 1 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Figure 2 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Figure 3 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Figure 4 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Jun 04, 2022
Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Jun 04, 2022
Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

Add code
Feb 12, 2022
Figure 1 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 2 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 3 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 4 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Feb 04, 2022
Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

Add code
Jan 29, 2022
Figure 1 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 2 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 3 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 4 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Viaarxiv icon