Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

May 31, 2019

Gregory Plumb, Maruan Al-Shedivat, Eric Xing, Ameet Talwalkar

Figure 1 for Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Figure 2 for Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Figure 3 for Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Figure 4 for Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Share this with someone who'll enjoy it:

Abstract:Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, which lack guarantees about their explanation quality. We propose an alternative to these approaches by directly regularizing a black-box model for interpretability at training time. Our approach explicitly connects three key aspects of interpretable machine learning: (i) the model's innate explainability, (ii) the explanation system used at test time, and (iii) the metrics that measure explanation quality. Our regularization results in substantial improvement in terms of the explanation fidelity and stability metrics across a range of datasets and black-box explanation systems while slightly improving accuracy. Further, if the resulting model is still not sufficiently interpretable, the weight of the regularization term can be adjusted to achieve the desired trade-off between accuracy and interpretability. Finally, we justify theoretically that the benefits of explanation-based regularization generalize to unseen points.

* presented at 2019 ICML Workshop on Human in the Loop Learning (HILL 2019), Long Beach, USA. arXiv admin note: substantial text overlap with arXiv:1902.06787

View paper on

Share this with someone who'll enjoy it:

Title:Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Paper and Code