Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Oct 15, 2021

Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

Figure 1 for MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Figure 2 for MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Figure 3 for MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Figure 4 for MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Share this with someone who'll enjoy it:

Abstract:Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. Fortunately, we observe that most inputs only activate a tiny ratio of neurons of large Transformer-based models during inference. Hence, we propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication, which could accelerate large-model inference by conditional computation based on the sparse activation phenomenon. MoEfication consists of two steps: (1) splitting the parameters of feed-forward neural networks (FFNs) into multiple parts as experts, and (2) building expert routers to decide which experts will be used for each input. Experimental results show that the MoEfied models can significantly reduce computation cost, e.g., only activating 20% FFN parameters of a 700-million-parameter model without performance degradation on several downstream tasks including text classification and machine reading comprehension.

View paper on

Share this with someone who'll enjoy it:

Title:MoEfication: Conditional Computation of Transformer Models for Efficient Inference

Paper and Code