Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning to Deceive with Attention-Based Explanations

Sep 17, 2019

Danish Pruthi, Mansi Gupta, Bhuwan Dhingra, Graham Neubig, Zachary C. Lipton

Figure 1 for Learning to Deceive with Attention-Based Explanations

Figure 2 for Learning to Deceive with Attention-Based Explanations

Figure 3 for Learning to Deceive with Attention-Based Explanations

Figure 4 for Learning to Deceive with Attention-Based Explanations

Share this with someone who'll enjoy it:

Abstract:Attention mechanisms are ubiquitous components in neural architectures applied in natural language processing. In addition to yielding gains in predictive accuracy, researchers often claim that attention weights confer interpretability, purportedly useful both for providing insights to practitioners and for explaining why a model makes its decisions to stakeholders. We call the latter use of attention mechanisms into question, demonstrating a simple method for training models to produce deceptive attention masks, diminishing the total weight assigned to designated impermissible tokens, even as the models are shown to nevertheless rely on these features to drive predictions. Across multiple models and datasets, our approach manipulates attention weights while paying surprisingly little cost in accuracy. Although our results do not rule out potential insights due to organically-trained attention, they cast doubt on attention's reliability as a tool for auditing algorithms, as in the context of fairness and accountability.

* Preprint. Ongoing work

View paper on

Share this with someone who'll enjoy it:

Title:Learning to Deceive with Attention-Based Explanations

Paper and Code