Alert button
Picture for Yiqi Chen

Yiqi Chen

Alert button

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Nov 24, 2021
Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, Guangyu Sun

Figure 1 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Figure 2 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Figure 3 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers
Figure 4 for PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Quantization is one of the most effective methods to compress neural networks, which has achieved great success on convolutional neural networks (CNNs). Recently, vision transformers have demonstrated great potential in computer vision. However, previous post-training quantization methods performed not well on vision transformer, resulting in more than 1% accuracy drop even in 8-bit quantization. Therefore, we analyze the problems of quantization on vision transformers. We observe the distributions of activation values after softmax and GELU functions are quite different from the Gaussian distribution. We also observe that common quantization metrics, such as MSE and cosine distance, are inaccurate to determine the optimal scaling factor. In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. And we propose to use a Hessian guided metric to evaluate different scaling factors, which improves the accuracy of calibration with a small cost. To enable the fast quantization of vision transformers, we develop an efficient framework, PTQ4ViT. Experiments show the quantized vision transformers achieve near-lossless prediction accuracy (less than 0.5% drop at 8-bit quantization) on the ImageNet classification task.

Viaarxiv icon

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

Oct 18, 2021
Zhihang Yuan, Yiqi Chen, Chenhao Xue, Chenguang Zhang, Qiankun Wang, Guangyu Sun

Figure 1 for PTQ-SL: Exploring the Sub-layerwise Post-training Quantization
Figure 2 for PTQ-SL: Exploring the Sub-layerwise Post-training Quantization
Figure 3 for PTQ-SL: Exploring the Sub-layerwise Post-training Quantization
Figure 4 for PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most existing approaches share the scaling factors layerwisely or channelwisely for quantization of convolutional layers. Channelwise quantization and layerwise quantization have been widely used in various applications. However, other quantization granularities are rarely explored. In this paper, we will explore the sub-layerwise granularity that shares the scaling factor across multiple input and output channels. We propose an efficient post-training quantization method in sub-layerwise granularity (PTQ-SL). Then we systematically experiment on various granularities and observe that the prediction accuracy of the quantized neural network has a strong correlation with the granularity. Moreover, we find that adjusting the position of the channels can improve the performance of sub-layerwise quantization. Therefore, we propose a method to reorder the channels for sub-layerwise quantization. The experiments demonstrate that the sub-layerwise quantization with appropriate channel reordering can outperform the channelwise quantization.

Viaarxiv icon