Abstract:A key challenge in employing data, algorithms and data-driven systems is to adhere to the principle of fairness and justice. Statistical fairness measures belong to an important category of technical/formal mechanisms for detecting fairness issues in data and algorithms. In this contribution we study the relations between two types of statistical fairness measures namely Statistical-Parity and Equalized-Odds. The Statistical-Parity measure does not rely on having ground truth, i.e., (objectively) labeled target attributes. This makes Statistical-Parity a suitable measure in practice for assessing fairness in data and data classification algorithms. Therefore, Statistical-Parity is adopted in many legal and professional frameworks for assessing algorithmic fairness. The Equalized-Odds measure, on the contrary, relies on having (reliable) ground-truth, which is not always feasible in practice. Nevertheless, there are several situations where the Equalized-Odds definition should be satisfied to enforce false prediction parity among sensitive social groups. We present a novel analyze of the relation between Statistical-Parity and Equalized-Odds, depending on the base-rates of sensitive groups. The analysis intuitively shows how and when base-rate imbalance causes incompatibility between Statistical-Parity and Equalized-Odds measures. As such, our approach provides insight in (how to make design) trade-offs between these measures in practice. Further, based on our results, we plea for examining base-rate (im)balance and investigating the possibility of such an incompatibility before enforcing or relying on the Statistical-Parity criterion. The insights provided, we foresee, may trigger initiatives to improve or adjust the current practice and/or the existing legal frameworks.




Abstract:Machine learning models use high dimensional feature spaces to map their inputs to the corresponding class labels. However, these features often do not have a one-to-one correspondence with physical concepts understandable by humans, which hinders the ability to provide a meaningful explanation for the decisions made by these models. We propose a method for measuring the correlation between high-level concepts and the decisions made by a machine learning model. Our method can isolate the impact of a given high-level concept and accurately measure it quantitatively. Additionally, this study aims to determine the prevalence of frequent patterns in machine learning models, which often occur in imbalanced datasets. We have successfully applied the proposed method to fundus images and managed to quantitatively measure the impact of radiomic patterns on the model decisions.