Alert button
Picture for Zhewei Yao

Zhewei Yao

Alert button

A Comprehensive Study on Post-Training Quantization for Large Language Models

Add code
Bookmark button
Alert button
Mar 16, 2023
Zhewei Yao, Cheng Li, Xiaoxia Wu, Stephen Youn, Yuxiong He

Figure 1 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 2 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 3 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Figure 4 for A Comprehensive Study on Post-Training Quantization for Large Language Models
Viaarxiv icon

Scaling Vision-Language Models with Sparse Mixture of Experts

Add code
Bookmark button
Alert button
Mar 13, 2023
Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He

Figure 1 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 2 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 3 for Scaling Vision-Language Models with Sparse Mixture of Experts
Figure 4 for Scaling Vision-Language Models with Sparse Mixture of Experts
Viaarxiv icon

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Add code
Bookmark button
Alert button
Jan 27, 2023
Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He

Figure 1 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 2 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 3 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 4 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Viaarxiv icon

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Add code
Bookmark button
Alert button
Dec 07, 2022
Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Yuxiong He

Figure 1 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 2 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 3 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 4 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Viaarxiv icon

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Add code
Bookmark button
Alert button
Nov 17, 2022
Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He

Figure 1 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 2 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 3 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 4 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Viaarxiv icon

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Add code
Bookmark button
Alert button
Jul 29, 2022
Yuxin Ma, Ping Gong, Jun Yi, Zhewei Yao, Minjie Wang, Cheng Li, Yuxiong He, Feng Yan

Figure 1 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 2 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 3 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 4 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Bookmark button
Alert button
Jun 04, 2022
Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He

Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Bookmark button
Alert button
Jun 04, 2022
Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He

Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Bookmark button
Alert button
Jan 14, 2022
Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon