Picture for Zhewei Yao

Zhewei Yao

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Add code
Jan 27, 2023
Figure 1 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 2 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 3 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Figure 4 for Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Viaarxiv icon

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Add code
Dec 07, 2022
Figure 1 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 2 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 3 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Figure 4 for DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Viaarxiv icon

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Add code
Nov 17, 2022
Figure 1 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 2 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 3 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Figure 4 for Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Viaarxiv icon

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Add code
Jul 29, 2022
Figure 1 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 2 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 3 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 4 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Jun 04, 2022
Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Jun 04, 2022
Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Jan 14, 2022
Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

What's Hidden in a One-layer Randomly Weighted Transformer?

Add code
Sep 08, 2021
Figure 1 for What's Hidden in a One-layer Randomly Weighted Transformer?
Figure 2 for What's Hidden in a One-layer Randomly Weighted Transformer?
Figure 3 for What's Hidden in a One-layer Randomly Weighted Transformer?
Figure 4 for What's Hidden in a One-layer Randomly Weighted Transformer?
Viaarxiv icon

How Much Can CLIP Benefit Vision-and-Language Tasks?

Add code
Jul 13, 2021
Figure 1 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 2 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 3 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 4 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Viaarxiv icon

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Add code
May 30, 2021
Figure 1 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Figure 2 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Figure 3 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Figure 4 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Viaarxiv icon