Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Toon Calders

No evaluation without fair representation : Impact of label and selection bias on the evaluation, performance and mitigation of classification models

Mar 10, 2026

Magali Legast, Toon Calders, François Fouss

Abstract:Bias can be introduced in diverse ways in machine learning datasets, for example via selection or label bias. Although these bias types in themselves have an influence on important aspects of fair machine learning, their different impact has been understudied. In this work, we empirically analyze the effect of label bias and several subtypes of selection bias on the evaluation of classification models, on their performance, and on the effectiveness of bias mitigation methods. We also introduce a biasing and evaluation framework that allows to model fair worlds and their biased counterparts through the introduction of controlled bias in real-life datasets with low discrimination. Using our framework, we empirically analyze the impact of each bias type independently, while obtaining a more representative evaluation of models and mitigation methods than with the traditional use of a subset of biased data as test set. Our results highlight different factors that influence how impactful bias is on model performance. They also show an absence of trade-off between fairness and accuracy, and between individual and group fairness, when models are evaluated on a test set that does not exhibit unwanted bias. They furthermore indicate that the performance of bias mitigation methods is influenced by the type of bias present in the data. Our findings call for future work to develop more accurate evaluations of prediction models and fairness interventions, but also to better understand other types of bias, more complex scenarios involving the combination of different bias types, and other factors that impact the efficiency of the mitigation methods, such as dataset characteristics.

* 31 pages, 14 figures + appendix Submitted to the ACM Journal on Responsible Computing

Via

Access Paper or Ask Questions

Would a Large Language Model Pay Extra for a View? Inferring Willingness to Pay from Subjective Choices

Feb 10, 2026

Manon Reusens, Sofie Goethals, Toon Calders, David Martens

Abstract:As Large Language Models (LLMs) are increasingly deployed in applications such as travel assistance and purchasing support, they are often required to make subjective choices on behalf of users in settings where no objectively correct answer exists. We study LLM decision-making in a travel-assistant context by presenting models with choice dilemmas and analyzing their responses using multinomial logit models to derive implied willingness to pay (WTP) estimates. These WTP values are subsequently compared to human benchmark values from the economics literature. In addition to a baseline setting, we examine how model behavior changes under more realistic conditions, including the provision of information about users' past choices and persona-based prompting. Our results show that while meaningful WTP values can be derived for larger LLMs, they also display systematic deviations at the attribute level. Additionally, they tend to overestimate human WTP overall, particularly when expensive options or business-oriented personas are introduced. Conditioning models on prior preferences for cheaper options yields valuations that are closer to human benchmarks. Overall, our findings highlight both the potential and the limitations of using LLMs for subjective decision support and underscore the importance of careful model selection, prompt design, and user representation when deploying such systems in practice.

Via

Access Paper or Ask Questions

Interpretable and Fair Mechanisms for Abstaining Classifiers

Mar 24, 2025

Daphne Lenders, Andrea Pugnana, Roberto Pellungrini, Toon Calders, Dino Pedreschi, Fosca Giannotti

Abstract:Abstaining classifiers have the option to refrain from providing a prediction for instances that are difficult to classify. The abstention mechanism is designed to trade off the classifier's performance on the accepted data while ensuring a minimum number of predictions. In this setting, often fairness concerns arise when the abstention mechanism solely reduces errors for the majority groups of the data, resulting in increased performance differences across demographic groups. While there exist a bunch of methods that aim to reduce discrimination when abstaining, there is no mechanism that can do so in an explainable way. In this paper, we fill this gap by introducing Interpretable and Fair Abstaining Classifier IFAC, an algorithm that can reject predictions both based on their uncertainty and their unfairness. By rejecting possibly unfair predictions, our method reduces error and positive decision rate differences across demographic groups of the non-rejected data. Since the unfairness-based rejections are based on an interpretable-by-design method, i.e., rule-based fairness checks and situation testing, we create a transparent process that can empower human decision-makers to review the unfair predictions and make more just decisions for them. This explainable aspect is especially important in light of recent AI regulations, mandating that any high-risk decision task should be overseen by human experts to reduce discrimination risks.

* 25 pages, 8 figures. In: Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024

Via

Access Paper or Ask Questions

"Patriarchy Hurts Men Too." Does Your Model Agree? A Discussion on Fairness Assumptions

Aug 01, 2024

Marco Favier, Toon Calders

Abstract:The pipeline of a fair ML practitioner is generally divided into three phases: 1) Selecting a fairness measure. 2) Choosing a model that minimizes this measure. 3) Maximizing the model's performance on the data. In the context of group fairness, this approach often obscures implicit assumptions about how bias is introduced into the data. For instance, in binary classification, it is often assumed that the best model, with equal fairness, is the one with better performance. However, this belief already imposes specific properties on the process that introduced bias. More precisely, we are already assuming that the biasing process is a monotonic function of the fair scores, dependent solely on the sensitive attribute. We formally prove this claim regarding several implicit fairness assumptions. This leads, in our view, to two possible conclusions: either the behavior of the biasing process is more complex than mere monotonicity, which means we need to identify and reject our implicit assumptions in order to develop models capable of tackling more complex situations; or the bias introduced in the data behaves predictably, implying that many of the developed models are superfluous.

Via

Access Paper or Ask Questions

FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Jul 23, 2024

Ewoenam Kwaku Tokpo, Toon Calders

Figure 1 for FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Figure 2 for FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Figure 3 for FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Figure 4 for FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Abstract:Despite the evolution of language models, they continue to portray harmful societal biases and stereotypes inadvertently learned from training data. These inherent biases often result in detrimental effects in various applications. Counterfactual Data Augmentation (CDA), which seeks to balance demographic attributes in training data, has been a widely adopted approach to mitigate bias in natural language processing. However, many existing CDA approaches rely on word substitution techniques using manually compiled word-pair dictionaries. These techniques often lead to out-of-context substitutions, resulting in potential quality issues. The advancement of model-based techniques, on the other hand, has been challenged by the need for parallel training data. Works in this area resort to manually generated parallel data that are expensive to collect and are consequently limited in scale. This paper proposes FairFlow, an automated approach to generating parallel data for training counterfactual text generator models that limits the need for human intervention. Furthermore, we show that FairFlow significantly overcomes the limitations of dictionary-based word-substitution approaches whilst maintaining good performance.

Via

Access Paper or Ask Questions

Cherry on the Cake: Fairness is NOT an Optimization Problem

Jun 24, 2024

Marco Favier, Toon Calders

Figure 1 for Cherry on the Cake: Fairness is NOT an Optimization Problem

Figure 2 for Cherry on the Cake: Fairness is NOT an Optimization Problem

Figure 3 for Cherry on the Cake: Fairness is NOT an Optimization Problem

Abstract:Fair cake-cutting is a mathematical subfield that studies the problem of fairly dividing a resource among a number of participants. The so-called ``cake,'' as an object, represents any resource that can be distributed among players. This concept is connected to supervised multi-label classification: any dataset can be thought of as a cake that needs to be distributed, where each label is a player that receives its share of the dataset. In particular, any efficient cake-cutting solution for the dataset is equivalent to an optimal decision function. Although we are not the first to demonstrate this connection, the important ramifications of this parallel seem to have been partially forgotten. We revisit these classical results and demonstrate how this connection can be prolifically used for fairness in machine learning problems. Understanding the set of achievable fair decisions is a fundamental step in finding optimal fair solutions and satisfying fairness requirements. By employing the tools of cake-cutting theory, we have been able to describe the behavior of optimal fair decisions, which, counterintuitively, often exhibit quite unfair properties. Specifically, in order to satisfy fairness constraints, it is sometimes preferable, in the name of optimality, to purposefully make mistakes and deny giving the positive label to deserving individuals in a community in favor of less worthy individuals within the same community. This practice is known in the literature as cherry-picking and has been described as ``blatantly unfair.''

Via

Access Paper or Ask Questions

How to be fair? A study of label and selection bias

Mar 21, 2024

Marco Favier, Toon Calders, Sam Pinxteren, Jonathan Meyer

Abstract:It is widely accepted that biased data leads to biased and thus potentially unfair models. Therefore, several measures for bias in data and model predictions have been proposed, as well as bias mitigation techniques whose aim is to learn models that are fair by design. Despite the myriad of mitigation techniques developed in the past decade, however, it is still poorly understood under what circumstances which methods work. Recently, Wick et al. showed, with experiments on synthetic data, that there exist situations in which bias mitigation techniques lead to more accurate models when measured on unbiased data. Nevertheless, in the absence of a thorough mathematical analysis, it remains unclear which techniques are effective under what circumstances. We propose to address this problem by establishing relationships between the type of bias and the effectiveness of a mitigation technique, where we categorize the mitigation techniques by the bias measure they optimize. In this paper we illustrate this principle for label and selection bias on the one hand, and demographic parity and ``We're All Equal'' on the other hand. Our theoretical analysis allows to explain the results of Wick et al. and we also show that there are situations where minimizing fairness measures does not result in the fairest possible distribution.

Via

Access Paper or Ask Questions

Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics

Jan 24, 2024

Sofie Goethals, Toon Calders, David Martens

Figure 1 for Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics

Figure 2 for Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics

Figure 3 for Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics

Figure 4 for Beyond Accuracy-Fairness: Stop evaluating bias mitigation methods solely on between-group metrics

Abstract:Artificial Intelligence (AI) finds widespread applications across various domains, sparking concerns about fairness in its deployment. While fairness in AI remains a central concern, the prevailing discourse often emphasizes outcome-based metrics without a nuanced consideration of the differential impacts within subgroups. Bias mitigation techniques do not only affect the ranking of pairs of instances across sensitive groups, but often also significantly affect the ranking of instances within these groups. Such changes are hard to explain and raise concerns regarding the validity of the intervention. Unfortunately, these effects largely remain under the radar in the accuracy-fairness evaluation framework that is usually applied. This paper challenges the prevailing metrics for assessing bias mitigation techniques, arguing that they do not take into account the changes within-groups and that the resulting prediction labels fall short of reflecting real-world scenarios. We propose a paradigm shift: initially, we should focus on generating the most precise ranking for each subgroup. Following this, individuals should be chosen from these rankings to meet both fairness standards and practical considerations.

Via

Access Paper or Ask Questions

Model-based Counterfactual Generator for Gender Bias Mitigation

Nov 06, 2023

Ewoenam Kwaku Tokpo, Toon Calders

Figure 1 for Model-based Counterfactual Generator for Gender Bias Mitigation

Figure 2 for Model-based Counterfactual Generator for Gender Bias Mitigation

Figure 3 for Model-based Counterfactual Generator for Gender Bias Mitigation

Figure 4 for Model-based Counterfactual Generator for Gender Bias Mitigation

Abstract:Counterfactual Data Augmentation (CDA) has been one of the preferred techniques for mitigating gender bias in natural language models. CDA techniques have mostly employed word substitution based on dictionaries. Although such dictionary-based CDA techniques have been shown to significantly improve the mitigation of gender bias, in this paper, we highlight some limitations of such dictionary-based counterfactual data augmentation techniques, such as susceptibility to ungrammatical compositions, and lack of generalization outside the set of predefined dictionary words. Model-based solutions can alleviate these problems, yet the lack of qualitative parallel training data hinders development in this direction. Therefore, we propose a combination of data processing techniques and a bi-objective training regime to develop a model-based solution for generating counterfactuals to mitigate gender bias. We implemented our proposed solution and performed an empirical evaluation which shows how our model alleviates the shortcomings of dictionary-based solutions.

Via

Access Paper or Ask Questions

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Jan 30, 2023

Ewoenam Tokpo, Pieter Delobelle, Bettina Berendt, Toon Calders

Figure 1 for How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Figure 2 for How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Figure 3 for How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Figure 4 for How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Abstract:To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of this. In this work, we design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover that instead of resolving gender bias, intrinsic mitigation techniques and metrics are able to hide it in such a way that significant gender information is retained in the embeddings. Furthermore, we show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not all, and each intrinsic bias measure can be fooled by some mitigation techniques, but not all. We confirm experimentally, that none of the intrinsic mitigation techniques used without any other fairness intervention is able to consistently impact extrinsic bias. We recommend that intrinsic bias mitigation techniques should be combined with other fairness interventions for downstream tasks.

Via

Access Paper or Ask Questions