Alert button
Picture for Elton Zheng

Elton Zheng

Alert button

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

Oct 26, 2023
Zhewei Yao, Reza Yazdani Aminabadi, Stephen Youn, Xiaoxia Wu, Elton Zheng, Yuxiong He

Viaarxiv icon

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Jun 30, 2022
Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He

Figure 1 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 2 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 3 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 4 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Viaarxiv icon