Picture for Zhewei Yao

Zhewei Yao

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Add code
Jul 29, 2022
Figure 1 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 2 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 3 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Figure 4 for BiFeat: Supercharge GNN Training via Graph Feature Quantization
Viaarxiv icon

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

Add code
Jun 04, 2022
Figure 1 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 2 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 3 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Figure 4 for ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Viaarxiv icon

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Add code
Jun 04, 2022
Figure 1 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 2 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 3 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Figure 4 for Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Viaarxiv icon

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

Add code
Jan 14, 2022
Figure 1 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 2 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 3 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Figure 4 for DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Viaarxiv icon

What's Hidden in a One-layer Randomly Weighted Transformer?

Add code
Sep 08, 2021
Figure 1 for What's Hidden in a One-layer Randomly Weighted Transformer?
Figure 2 for What's Hidden in a One-layer Randomly Weighted Transformer?
Figure 3 for What's Hidden in a One-layer Randomly Weighted Transformer?
Figure 4 for What's Hidden in a One-layer Randomly Weighted Transformer?
Viaarxiv icon

How Much Can CLIP Benefit Vision-and-Language Tasks?

Add code
Jul 13, 2021
Figure 1 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 2 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 3 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Figure 4 for How Much Can CLIP Benefit Vision-and-Language Tasks?
Viaarxiv icon

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Add code
May 30, 2021
Figure 1 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Figure 2 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Figure 3 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Figure 4 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models
Viaarxiv icon

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

Add code
Apr 29, 2021
Figure 1 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Figure 2 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Figure 3 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Figure 4 for ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
Viaarxiv icon

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

Add code
Mar 31, 2021
Figure 1 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 2 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 3 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Figure 4 for Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition
Viaarxiv icon

A Survey of Quantization Methods for Efficient Neural Network Inference

Add code
Mar 25, 2021
Figure 1 for A Survey of Quantization Methods for Efficient Neural Network Inference
Figure 2 for A Survey of Quantization Methods for Efficient Neural Network Inference
Figure 3 for A Survey of Quantization Methods for Efficient Neural Network Inference
Figure 4 for A Survey of Quantization Methods for Efficient Neural Network Inference
Viaarxiv icon