Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Jun 15, 2020

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Song Han

Figure 1 for APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Figure 2 for APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Figure 3 for APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Figure 4 for APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Share this with someone who'll enjoy it:

Abstract:We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.

* Accepted by CVPR 2020

View paper on

Share this with someone who'll enjoy it:

Title:APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Paper and Code