Alert button
Picture for Satyapriya Krishna

Satyapriya Krishna

Alert button

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

Feb 10, 2023
Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju

Figure 1 for Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
Figure 2 for Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
Figure 3 for Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
Figure 4 for Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

The Right to Explanation and the Right to be Forgotten are two important principles outlined to regulate algorithmic decision making and data usage in real-world applications. While the right to explanation allows individuals to request an actionable explanation for an algorithmic decision, the right to be forgotten grants them the right to ask for their data to be deleted from all the databases and models of an organization. Intuitively, enforcing the right to be forgotten may trigger model updates which in turn invalidate previously provided explanations, thus violating the right to explanation. In this work, we investigate the technical implications arising due to the interference between the two aforementioned regulatory principles, and propose the first algorithmic framework to resolve the tension between them. To this end, we formulate a novel optimization problem to generate explanations that are robust to model updates due to the removal of training data instances by data deletion requests. We then derive an efficient approximation algorithm to handle the combinatorial complexity of this optimization problem. We theoretically demonstrate that our method generates explanations that are provably robust to worst-case data deletion requests with bounded costs in case of linear models and certain classes of non-linear models. Extensive experimentation with real-world datasets demonstrates the efficacy of the proposed framework.

Viaarxiv icon

TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues

Jul 08, 2022
Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh

Figure 1 for TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues
Figure 2 for TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues
Figure 3 for TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues
Figure 4 for TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues

Machine Learning (ML) models are increasingly used to make critical decisions in real-world applications, yet they have also become more complex, making them harder to understand. To this end, several techniques to explain model predictions have been proposed. However, practitioners struggle to leverage explanations because they often do not know which to use, how to interpret the results, and may have insufficient data science experience to obtain explanations. In addition, most current works focus on generating one-shot explanations and do not allow users to follow up and ask fine-grained questions about the explanations, which can be frustrating. In this work, we address these challenges by introducing TalkToModel: an open-ended dialogue system for understanding machine learning models. Specifically, TalkToModel comprises three key components: 1) a natural language interface for engaging in dialogues, making understanding ML models highly accessible, 2) a dialogue engine that adapts to any tabular model and dataset, interprets natural language, maps it to appropriate operations (e.g., feature importance explanations, counterfactual explanations, showing model errors), and generates text responses, and 3) an execution component that run the operations and ensures explanations are accurate. We carried out quantitative and human subject evaluations of TalkToModel. We found the system understands user questions on novel datasets and models with high accuracy, demonstrating the system's capacity to generalize to new situations. In human evaluations, 73% of healthcare workers (e.g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems, and 84.6% of ML graduate students agreed TalkToModel was easier to use.

* Pre-print; comments welcome! Reach out to dslack@uci.edu 
Viaarxiv icon

OpenXAI: Towards a Transparent Evaluation of Model Explanations

Jun 22, 2022
Chirag Agarwal, Eshika Saxena, Satyapriya Krishna, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

Figure 1 for OpenXAI: Towards a Transparent Evaluation of Model Explanations
Figure 2 for OpenXAI: Towards a Transparent Evaluation of Model Explanations
Figure 3 for OpenXAI: Towards a Transparent Evaluation of Model Explanations
Figure 4 for OpenXAI: Towards a Transparent Evaluation of Model Explanations

While several types of post hoc explanation methods (e.g., feature attribution methods) have been proposed in recent literature, there is little to no work on systematically benchmarking these methods in an efficient and transparent manner. Here, we introduce OpenXAI, a comprehensive and extensible open source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, (ii) open-source implementations of twenty-two quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, and (iii) the first ever public XAI leaderboards to benchmark explanations. OpenXAI is easily extensible, as users can readily evaluate custom explanation methods and incorporate them into our leaderboards. Overall, OpenXAI provides an automated end-to-end pipeline that not only simplifies and standardizes the evaluation of post hoc explanation methods, but also promotes transparency and reproducibility in benchmarking these methods. OpenXAI datasets and data loaders, implementations of state-of-the-art explanation methods and evaluation metrics, as well as leaderboards are publicly available at https://open-xai.github.io/.

* Preprint 
Viaarxiv icon

Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Mar 23, 2022
Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver Steeg, Aram Galstyan

Figure 1 for Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Figure 2 for Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Figure 3 for Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Figure 4 for Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings. However, these models can be biased in multiple ways, including the unfounded association of male and female genders with gender-neutral professions. Therefore, knowledge distillation without any fairness constraints may preserve or exaggerate the teacher model's biases onto the distilled model. To this end, we present a novel approach to mitigate gender disparity in text generation by learning a fair model during knowledge distillation. We propose two modifications to the base knowledge distillation based on counterfactual role reversal$\unicode{x2014}$modifying teacher probabilities and augmenting the training set. We evaluate gender polarity across professions in open-ended text generated from the resulting distilled and finetuned GPT$\unicode{x2012}$2 models and demonstrate a substantial reduction in gender disparity with only a minor compromise in utility. Finally, we observe that language models that reduce gender polarity in language generation do not improve embedding fairness or downstream classification fairness.

* To appear in the Findings of ACL 2022 
Viaarxiv icon

Measuring Fairness of Text Classifiers via Prediction Sensitivity

Mar 16, 2022
Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, Kai-Wei Chang

Figure 1 for Measuring Fairness of Text Classifiers via Prediction Sensitivity
Figure 2 for Measuring Fairness of Text Classifiers via Prediction Sensitivity
Figure 3 for Measuring Fairness of Text Classifiers via Prediction Sensitivity
Figure 4 for Measuring Fairness of Text Classifiers via Prediction Sensitivity

With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation : ACCUMULATED PREDICTION SENSITIVITY, which measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans' perception of fairness. We conduct experiments on two text classification datasets : JIGSAW TOXICITY, and BIAS IN BIOS, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.

Viaarxiv icon

Rethinking Stability for Attribution-based Explanations

Mar 14, 2022
Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

Figure 1 for Rethinking Stability for Attribution-based Explanations
Figure 2 for Rethinking Stability for Attribution-based Explanations
Figure 3 for Rethinking Stability for Attribution-based Explanations
Figure 4 for Rethinking Stability for Attribution-based Explanations

As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e.g., robust to infinitesimal perturbations to an input. However, previous works have shown that state-of-the-art explanation methods generate unstable explanations. Here, we introduce metrics to quantify the stability of an explanation and show that several popular explanation methods are unstable. In particular, we propose new Relative Stability metrics that measure the change in output explanation with respect to change in input, model representation, or output of the underlying predictor. Finally, our experimental evaluation with three real-world datasets demonstrates interesting insights for seven explanation methods and different stability metrics.

Viaarxiv icon

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Feb 08, 2022
Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju

Figure 1 for The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Figure 2 for The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Figure 3 for The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Figure 4 for The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how do practitioners resolve these disagreements. To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and eight different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that state-of-the-art explanation methods often disagree in terms of the explanations they output. Our findings also underscore the importance of developing principled evaluation metrics that enable practitioners to effectively compare explanations.

Viaarxiv icon

Towards Realistic Single-Task Continuous Learning Research for NER

Oct 27, 2021
Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, Rahul Gupta

Figure 1 for Towards Realistic Single-Task Continuous Learning Research for NER
Figure 2 for Towards Realistic Single-Task Continuous Learning Research for NER
Figure 3 for Towards Realistic Single-Task Continuous Learning Research for NER
Figure 4 for Towards Realistic Single-Task Continuous Learning Research for NER

There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.

* 11 pages, 2 figures, Findings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) (short paper), November 2021 
Viaarxiv icon

Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

Jun 21, 2021
Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, Kai-Wei Chang

Figure 1 for Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification
Figure 2 for Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification
Figure 3 for Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification
Figure 4 for Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training. Separately, certified word substitution robustness methods have been developed to decrease the impact of spurious features and synonym substitutions on model predictions. While their end goals are different, they both aim to encourage models to make the same prediction for certain changes in the input. In this paper, we investigate the utility of certified word substitution robustness methods to improve equality of odds and equality of opportunity on multiple text classification tasks. We observe that certified robustness methods improve fairness, and using both robustness and bias mitigation methods in training results in an improvement in both fronts

Viaarxiv icon