Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Run-Time Efficient RNN Compression for Inference on Edge Devices

Jun 18, 2019

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew Mattina

Figure 1 for Run-Time Efficient RNN Compression for Inference on Edge Devices

Figure 2 for Run-Time Efficient RNN Compression for Inference on Edge Devices

Figure 3 for Run-Time Efficient RNN Compression for Inference on Edge Devices

Figure 4 for Run-Time Efficient RNN Compression for Inference on Edge Devices

Share this with someone who'll enjoy it:

Abstract:Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained" features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning and retaining more model accuracy than matrix factorization. We evaluate this technique on 3 benchmarks.

* Published at 4th edition of Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications at International Symposium of Computer Architecture 2019, Phoenix, Arizona (https://www.emc2-workshop.com/isca-19) colocated with ISCA 2019

View paper on

Share this with someone who'll enjoy it:

Title:Run-Time Efficient RNN Compression for Inference on Edge Devices

Paper and Code