Alert button
Picture for Yuxiong He

Yuxiong He

Alert button

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Add code
Bookmark button
Alert button
Dec 07, 2022
Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Yuxiong He

Figure 1 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 2 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 3 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 4 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Viaarxiv icon

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Add code
Bookmark button
Alert button
Nov 17, 2022
Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He

Figure 1 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 2 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 3 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 4 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Viaarxiv icon

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Add code
Bookmark button
Alert button
Jul 29, 2022
Yuxin Ma, Ping Gong, Jun Yi, Zhewei Yao, Minjie Wang, Cheng Li, Yuxiong He, Feng Yan

Figure 1 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 2 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 3 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 4 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Viaarxiv icon

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Add code
Bookmark button
Alert button
Jun 30, 2022
Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He

Figure 1 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 2 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 3 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 4 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Viaarxiv icon

Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding

Add code
Bookmark button
Alert button
Jun 30, 2022
Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu

Figure 1 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Figure 2 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Figure 3 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Figure 4 for Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Bookmark button
Alert button
Jun 04, 2022
Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He

Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Bookmark button
Alert button
Jun 04, 2022
Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He

Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam

Add code
Bookmark button
Alert button
Feb 12, 2022
Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He

Figure 1 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 2 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 3 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Figure 4 for Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Bookmark button
Alert button
Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise

Add code
Bookmark button
Alert button
Jan 29, 2022
Minjia Zhang, Niranjan Uma Naresh, Yuxiong He

Figure 1 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 2 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 3 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Figure 4 for ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Viaarxiv icon