Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases


Jan 27, 2023
Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing


Dec 07, 2022
Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Yuxiong He

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers


Nov 17, 2022
Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He

Add code

* 22 pages 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

BiFeat: Supercharge GNN Training via Graph Feature Quantization


Jul 29, 2022
Yuxin Ma, Ping Gong, Jun Yi, Zhewei Yao, Minjie Wang, Cheng Li, Yuxiong He, Feng Yan

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers


Jun 04, 2022
Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He

Add code

* 11 pages, 4 figures 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Extreme Compression for Pre-trained Transformers Made Simple and Efficient


Jun 04, 2022
Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He

Add code

* 10 pages, 4 figures 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale


Jan 14, 2022
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

What's Hidden in a One-layer Randomly Weighted Transformer?


Sep 08, 2021
Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael W. Mahoney

Add code

* EMNLP 2021 (short) 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

How Much Can CLIP Benefit Vision-and-Language Tasks?


Jul 13, 2021
Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

Add code

* 14 pages 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models


May 30, 2021
Zhewei Yao, Linjian Ma, Sheng Shen, Kurt Keutzer, Michael W. Mahoney

Add code

* 20 pages, 4 figures, 9 tables 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email
1
2
3
4
>>