Alert button
Picture for Babak Salimi

Babak Salimi

Alert button

Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias

Dec 21, 2022
Jiongli Zhu, Nazanin Sabri, Sainyam Galhotra, Babak Salimi

Figure 1 for Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias
Figure 2 for Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias
Figure 3 for Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias
Figure 4 for Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias

A recent explosion of research focuses on developing methods and tools for building fair predictive models. However, most of this work relies on the assumption that the training and testing data are representative of the target population on which the model will be deployed. However, real-world training data often suffer from selection bias and are not representative of the target population for many reasons, including the cost and feasibility of collecting and labeling data, historical discrimination, and individual biases. In this paper, we introduce a new framework for certifying and ensuring the fairness of predictive models trained on biased data. We take inspiration from query answering over incomplete and inconsistent databases to present and formalize the problem of consistent range approximation (CRA) of answers to queries about aggregate information for the target population. We aim to leverage background knowledge about the data collection process, biased data, and limited or no auxiliary data sources to compute a range of answers for aggregate queries over the target population that are consistent with available information. We then develop methods that use CRA of such aggregate queries to build predictive models that are certifiably fair on the target population even when no external information about that population is available during training. We evaluate our methods on real data and demonstrate improvements over state of the art. Significantly, we show that enforcing fairness using our methods can lead to predictive models that are not only fair, but more accurate on the target population.

Viaarxiv icon

Combining Counterfactuals With Shapley Values To Explain Image Models

Jun 14, 2022
Aditya Lahiri, Kamran Alipour, Ehsan Adeli, Babak Salimi

Figure 1 for Combining Counterfactuals With Shapley Values To Explain Image Models
Figure 2 for Combining Counterfactuals With Shapley Values To Explain Image Models

With the widespread use of sophisticated machine learning models in sensitive applications, understanding their decision-making has become an essential task. Models trained on tabular data have witnessed significant progress in explanations of their underlying decision making processes by virtue of having a small number of discrete features. However, applying these methods to high-dimensional inputs such as images is not a trivial task. Images are composed of pixels at an atomic level and do not carry any interpretability by themselves. In this work, we seek to use annotated high-level interpretable features of images to provide explanations. We leverage the Shapley value framework from Game Theory, which has garnered wide acceptance in general XAI problems. By developing a pipeline to generate counterfactuals and subsequently using it to estimate Shapley values, we obtain contrastive and interpretable explanations with strong axiomatic guarantees.

* ICML 2022 Workshop on Responsible Decision Making in Dynamic Environments  
Viaarxiv icon

Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces

Jun 10, 2022
Kamran Alipour, Aditya Lahiri, Ehsan Adeli, Babak Salimi, Michael Pazzani

Figure 1 for Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces
Figure 2 for Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces
Figure 3 for Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces
Figure 4 for Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces

Despite their high accuracies, modern complex image classifiers cannot be trusted for sensitive tasks due to their unknown decision-making process and potential biases. Counterfactual explanations are very effective in providing transparency for these black-box algorithms. Nevertheless, generating counterfactuals that can have a consistent impact on classifier outputs and yet expose interpretable feature changes is a very challenging task. We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers using pretrained generative models without any re-training or conditioning. The generative models in this technique are not bound to be trained on the same data as the target classifier. We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers. On the task of face attribute classification, we show how different attributes influence the classifier output by providing both causal and contrastive feature attributions, and the corresponding counterfactual images.

Viaarxiv icon

Interpretable Data-Based Explanations for Fairness Debugging

Dec 17, 2021
Romila Pradhan, Jiongli Zhu, Boris Glavic, Babak Salimi

Figure 1 for Interpretable Data-Based Explanations for Fairness Debugging
Figure 2 for Interpretable Data-Based Explanations for Fairness Debugging
Figure 3 for Interpretable Data-Based Explanations for Fairness Debugging
Figure 4 for Interpretable Data-Based Explanations for Fairness Debugging

A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely reporting on a model's bias, or generating explanations using existing XAI techniques is insufficient to locate and eventually mitigate sources of bias. In this work, we introduce Gopher, a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior by identifying coherent subsets of the training data that are root-causes for this behavior. Specifically, we introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias. Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias that utilizes techniques from the ML community to approximate causal responsibility and uses pruning rules to manage the large search space for patterns. Our experimental evaluation demonstrates the effectiveness of Gopher in generating interpretable explanations for identifying and debugging sources of bias.

* Proceedings of the 2022 International Conference on Management of Data. ACM, 2022 
Viaarxiv icon

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Mar 22, 2021
Sainyam Galhotra, Romila Pradhan, Babak Salimi

Figure 1 for Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals
Figure 2 for Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals
Figure 3 for Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals
Figure 4 for Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrutinize and trust them. Prior work in this context has focused on the attribution of responsibility for an algorithm's decisions to its inputs wherein responsibility is typically approached as a purely associational concept. In this paper, we propose a principled causality-based approach for explaining black-box decision-making systems that addresses limitations of existing methods in XAI. At the core of our framework lies probabilistic contrastive counterfactuals, a concept that can be traced back to philosophical, cognitive, and social foundations of theories on how humans generate and select explanations. We show how such counterfactuals can quantify the direct and indirect influences of a variable on decisions made by an algorithm, and provide actionable recourse for individuals negatively affected by the algorithm's decision. Unlike prior work, our system, LEWIS: (1)can compute provably effective explanations and recourse at local, global and contextual levels (2)is designed to work with users with varying levels of background knowledge of the underlying causal model and (3)makes no assumptions about the internals of an algorithmic system except for the availability of its input-output data. We empirically evaluate LEWIS on three real-world datasets and show that it generates human-understandable explanations that improve upon state-of-the-art approaches in XAI, including the popular LIME and SHAP. Experiments on synthetic data further demonstrate the correctness of LEWIS's explanations and the scalability of its recourse algorithm.

* Proceedings of the 2021 International Conference on Management of Data. ACM, 2021 
Viaarxiv icon

Causal Relational Learning

Apr 07, 2020
Babak Salimi, Harsh Parikh, Moe Kayali, Sudeepa Roy, Lise Getoor, Dan Suciu

Figure 1 for Causal Relational Learning
Figure 2 for Causal Relational Learning
Figure 3 for Causal Relational Learning
Figure 4 for Causal Relational Learning

Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CaRL for capturing causal background knowledge and assumptions and specifying causal queries using simple Datalog-like rules.CaRL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CaRL in social sciences and healthcare.

Viaarxiv icon

Data Management for Causal Algorithmic Fairness

Oct 01, 2019
Babak Salimi, Bill Howe, Dan Suciu

Figure 1 for Data Management for Causal Algorithmic Fairness
Figure 2 for Data Management for Causal Algorithmic Fairness
Figure 3 for Data Management for Causal Algorithmic Fairness
Figure 4 for Data Management for Causal Algorithmic Fairness

Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflects discrimination, suggesting a data management problem. In this paper, we first make a distinction between associational and causal definitions of fairness in the literature and argue that the concept of fairness requires causal reasoning. We then review existing works and identify future opportunities for applying data management techniques to causal algorithmic fairness.

* arXiv admin note: text overlap with arXiv:1902.08283 
Viaarxiv icon

Capuchin: Causal Database Repair for Algorithmic Fairness

Feb 26, 2019
Babak Salimi, Luke Rodriguez, Bill Howe, Dan Suciu

Figure 1 for Capuchin: Causal Database Repair for Algorithmic Fairness
Figure 2 for Capuchin: Causal Database Repair for Algorithmic Fairness
Figure 3 for Capuchin: Causal Database Repair for Algorithmic Fairness
Figure 4 for Capuchin: Causal Database Repair for Algorithmic Fairness

Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson's paradox. Proposals for causality-based definitions of fairness can correctly model some of these situations, but they require specification of the underlying causal models. In this paper, we formalize the situation as a database repair problem, proving sufficient conditions for fair classifiers in terms of admissible variables as opposed to a complete causal model. We show that these conditions correctly capture subtle fairness violations. We then use these conditions as the basis for database repair algorithms that provide provable fairness guarantees about classifiers trained on their training labels. We evaluate our algorithms on real data, demonstrating improvement over the state of the art on multiple fairness metrics proposed in the literature while retaining high utility.

Viaarxiv icon

A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence

Aug 08, 2017
Sudeepa Roy, Babak Salimi

Figure 1 for A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence

The study of causality or causal inference - how much a given treatment causally affects a given outcome in a population - goes way beyond correlation or association analysis of variables, and is critical in making sound data driven decisions and policies in a multitude of applications. The gold standard in causal inference is performing "controlled experiments", which often is not possible due to logistical or ethical reasons. As an alternative, inferring causality on "observational data" based on the "Neyman-Rubin potential outcome model" has been extensively used in statistics, economics, and social sciences over several decades. In this paper, we present a formal framework for sound causal analysis on observational datasets that are given as multiple relations and where the population under study is obtained by joining these base relations. We study a crucial condition for inferring causality from observational data, called the "strong ignorability assumption" (the treatment and outcome variables should be independent in the joined relation given the observed covariates), using known conditional independences that hold in the base relations. We also discuss how the structure of the conditional independences in base relations given as graphical models help infer new conditional independences in the joined relation. The proposed framework combines concepts from databases, statistics, and graphical models, and aims to initiate new research directions spanning these fields to facilitate powerful data-driven decisions in today's big data world.

Viaarxiv icon