Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

João Vitor Pamplona

FairML: A Julia Package for Fair Classification

Dec 03, 2024

Jan Pablo Burgard, João Vitor Pamplona

Abstract:In this paper, we propose FairML.jl, a Julia package providing a framework for fair classification in machine learning. In this framework, the fair learning process is divided into three stages. Each stage aims to reduce unfairness, such as disparate impact and disparate mistreatment, in the final prediction. For the preprocessing stage, we present a resampling method that addresses unfairness coming from data imbalances. The in-processing phase consist of a classification method. This can be either one coming from the MLJ.jl package, or a user defined one. For this phase, we incorporate fair ML methods that can handle unfairness to a certain degree through their optimization process. In the post-processing, we discuss the choice of the cut-off value for fair prediction. With simulations, we show the performance of the single phases and their combinations.

* 25 pages, 8 figures

Via

Access Paper or Ask Questions

Fair Generalized Linear Mixed Models

May 15, 2024

Jan Pablo Burgard, João Vitor Pamplona

Figure 1 for Fair Generalized Linear Mixed Models

Figure 2 for Fair Generalized Linear Mixed Models

Figure 3 for Fair Generalized Linear Mixed Models

Figure 4 for Fair Generalized Linear Mixed Models

Abstract:When using machine learning for automated prediction, it is important to account for fairness in the prediction. Fairness in machine learning aims to ensure that biases in the data and model inaccuracies do not lead to discriminatory decisions. E.g., predictions from fair machine learning models should not discriminate against sensitive variables such as sexual orientation and ethnicity. The training data often in obtained from social surveys. In social surveys, oftentimes the data collection process is a strata sampling, e.g. due to cost restrictions. In strata samples, the assumption of independence between the observation is not fulfilled. Hence, if the machine learning models do not account for the strata correlations, the results may be biased. Especially high is the bias in cases where the strata assignment is correlated to the variable of interest. We present in this paper an algorithm that can handle both problems simultaneously, and we demonstrate the impact of stratified sampling on the quality of fair machine learning predictions in a reproducible simulation study.

* 25 pages, 12 figures. arXiv admin note: text overlap with arXiv:2405.06433

Via

Access Paper or Ask Questions

Fair Mixed Effects Support Vector Machine

May 10, 2024

João Vitor Pamplona, Jan Pablo Burgard

Figure 1 for Fair Mixed Effects Support Vector Machine

Figure 2 for Fair Mixed Effects Support Vector Machine

Figure 3 for Fair Mixed Effects Support Vector Machine

Figure 4 for Fair Mixed Effects Support Vector Machine

Abstract:To ensure unbiased and ethical automated predictions, fairness must be a core principle in machine learning applications. Fairness in machine learning aims to mitigate biases present in the training data and model imperfections that could lead to discriminatory outcomes. This is achieved by preventing the model from making decisions based on sensitive characteristics like ethnicity or sexual orientation. A fundamental assumption in machine learning is the independence of observations. However, this assumption often does not hold true for data describing social phenomena, where data points are often clustered based. Hence, if the machine learning models do not account for the cluster correlations, the results may be biased. Especially high is the bias in cases where the cluster assignment is correlated to the variable of interest. We present a fair mixed effects support vector machine algorithm that can handle both problems simultaneously. With a reproducible simulation study we demonstrate the impact of clustered data on the quality of fair machine learning predictions.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions