Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Scheduled DropHead: A Regularization Method for Transformer Models

Apr 28, 2020

Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Figure 1 for Scheduled DropHead: A Regularization Method for Transformer Models

Figure 2 for Scheduled DropHead: A Regularization Method for Transformer Models

Figure 3 for Scheduled DropHead: A Regularization Method for Transformer Models

Figure 4 for Scheduled DropHead: A Regularization Method for Transformer Models

Share this with someone who'll enjoy it:

Abstract:In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks. In contrast to the conventional dropout mechanisms which randomly drop units or connections, the proposed DropHead is a structured dropout method. It drops entire attention-heads during training and It prevents the multi-head attention model from being dominated by a small portion of attention heads while also reduces the risk of overfitting the training data, thus making use of the multi-head attention mechanism more efficiently. Motivated by recent studies about the learning dynamic of the multi-head attention mechanism, we propose a specific dropout rate schedule to adaptively adjust the dropout rate of DropHead and achieve better regularization effect. Experimental results on both machine translation and text classification benchmark datasets demonstrate the effectiveness of the proposed approach.

View paper on

Share this with someone who'll enjoy it:

Title:Scheduled DropHead: A Regularization Method for Transformer Models

Paper and Code