Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

May 30, 2021

Zhewei Yao, Linjian Ma, Sheng Shen, Kurt Keutzer, Michael W. Mahoney

Figure 1 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Figure 2 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Figure 3 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Figure 4 for MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Share this with someone who'll enjoy it:

Abstract:Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. However, current approaches either only explore head pruning, which has a limited pruning ratio, or only focus on unstructured pruning, which has negligible effects on the real inference time and/or power consumption. To address these challenges, we develop a novel MultiLevel structured Pruning (MLPruning) framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning. We propose using a learnable Top-k threshold, which employs an adaptive regularization to adjust the regularization magnitude adaptively, to select appropriate pruning ratios for different weight matrices. We also propose a two-step pipeline to combine block-wise pruning with head/row pruning to achieve high structured pruning ratios with minimum accuracy degradation. Our empirical results show that for \bertbase, with \textapprox20\% of remaining weights, \OURS can achieve an accuracy that is comparable to the full model on QQP/MNLI/\squad, with up to \textapprox3.69x speedup. Our framework has been open sourced~\cite{codebase}.

* 20 pages, 4 figures, 9 tables

View paper on

Share this with someone who'll enjoy it:

Title:MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Paper and Code