Alert button
Picture for Reza Yazdani Aminabadi

Reza Yazdani Aminabadi

Alert button

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

Jan 09, 2024
Connor Holmes, Masahiro Tanaka, Michael Wyatt, Ammar Ahmad Awan, Jeff Rasley, Samyam Rajbhandari, Reza Yazdani Aminabadi, Heyang Qin, Arash Bakhtiari, Lev Kurilenko, Yuxiong He

Viaarxiv icon

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Dec 18, 2023
Xiaoxia Wu, Haojun Xia, Stephen Youn, Zhen Zheng, Shiyang Chen, Arash Bakhtiari, Michael Wyatt, Reza Yazdani Aminabadi, Yuxiong He, Olatunji Ruwase, Leon Song, Zhewei Yao

Figure 1 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Figure 2 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Figure 3 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Figure 4 for ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Viaarxiv icon

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers

Oct 26, 2023
Zhewei Yao, Reza Yazdani Aminabadi, Stephen Youn, Xiaoxia Wu, Elton Zheng, Yuxiong He

Viaarxiv icon

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Aug 02, 2023
Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He

Figure 1 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 2 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 3 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Figure 4 for DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Viaarxiv icon

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Jan 27, 2023
Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He

Figure 1 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 2 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 3 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 4 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Viaarxiv icon

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

Jun 30, 2022
Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He

Figure 1 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 2 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 3 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Figure 4 for DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Jun 04, 2022
Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He

Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Jan 14, 2022
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

ZeRO-Offload: Democratizing Billion-Scale Model Training

Jan 18, 2021
Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He

Figure 1 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 2 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 3 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Figure 4 for ZeRO-Offload: Democratizing Billion-Scale Model Training
Viaarxiv icon