Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Validity of Self-Attention as Explanation in Transformer Models

Aug 12, 2019

Gino Brunner, Yang Liu, Damián Pascual, Oliver Richter, Roger Wattenhofer

Figure 1 for On the Validity of Self-Attention as Explanation in Transformer Models

Figure 2 for On the Validity of Self-Attention as Explanation in Transformer Models

Figure 3 for On the Validity of Self-Attention as Explanation in Transformer Models

Figure 4 for On the Validity of Self-Attention as Explanation in Transformer Models

Share this with someone who'll enjoy it:

Abstract:Explainability of deep learning systems is a vital requirement for many applications. However, it is still an unsolved problem. Recent self-attention based models for natural language processing, such as the Transformer or BERT, offer hope of greater explainability by providing attention maps that can be directly inspected. Nevertheless, by just looking at the attention maps one often overlooks that the attention is not over words but over hidden embeddings, which themselves can be mixed representations of multiple embeddings. We investigate to what extent the implicit assumption made in many recent papers - that hidden embeddings at all layers still correspond to the underlying words - is justified. We quantify how much embeddings are mixed based on a gradient based attribution method and find that already after the first layer less than 50% of the embedding is attributed to the underlying word, declining thereafter to a median contribution of 7.5% in the last layer. While throughout the layers the underlying word remains as the one contributing most to the embedding, we argue that attention visualizations are misleading and should be treated with care when explaining the underlying deep learning system.

* Preprint. Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:On the Validity of Self-Attention as Explanation in Transformer Models

Paper and Code