Abstract:Explainability is crucial for improving the transparency of black-box machine learning models. With the advancement of explanation methods such as LIME and SHAP, various XAI performance metrics have been developed to evaluate the quality of explanations. However, different explainers can provide contrasting explanations for the same prediction, introducing trade-offs across conflicting quality metrics. Although available aggregation approaches improve robustness, reducing explanations' variability, very limited research employed a multi-criteria decision-making approach. To address this gap, this paper introduces a multi-criteria rank-based weighted aggregation method that balances multiple quality metrics simultaneously to produce an ensemble of explanation models. Furthermore, we propose rank-based versions of existing XAI metrics (complexity, faithfulness and stability) to better evaluate ranked feature importance explanations. Extensive experiments on publicly available datasets demonstrate the robustness of the proposed model across these metrics. Comparative analyses of various multi-criteria decision-making and rank aggregation algorithms showed that TOPSIS and WSUM are the best candidates for this use case.
Abstract:In recent years, fairness in machine learning has emerged as a critical concern to ensure that developed and deployed predictive models do not have disadvantageous predictions for marginalized groups. It is essential to mitigate discrimination against individuals based on protected attributes such as gender and race. In this work, we consider applying subgroup justice concepts to gradient-boosting machines designed for supervised learning problems. Our approach expanded gradient-boosting methodologies to explore a broader range of objective functions, which combines conventional losses such as the ones from classification and regression and a min-max fairness term. We study relevant theoretical properties of the solution of the min-max optimization problem. The optimization process explored the primal-dual problems at each boosting round. This generic framework can be adapted to diverse fairness concepts. The proposed min-max primal-dual gradient boosting algorithm was theoretically shown to converge under mild conditions and empirically shown to be a powerful and flexible approach to address binary and subgroup fairness.
Abstract:The widespread use of machine learning in credit scoring has brought significant advancements in risk assessment and decision-making. However, it has also raised concerns about potential biases, discrimination, and lack of transparency in these automated systems. This tutorial paper performed a non-systematic literature review to guide best practices for developing responsible machine learning models in credit scoring, focusing on fairness, reject inference, and explainability. We discuss definitions, metrics, and techniques for mitigating biases and ensuring equitable outcomes across different groups. Additionally, we address the issue of limited data representativeness by exploring reject inference methods that incorporate information from rejected loan applications. Finally, we emphasize the importance of transparency and explainability in credit models, discussing techniques that provide insights into the decision-making process and enable individuals to understand and potentially improve their creditworthiness. By adopting these best practices, financial institutions can harness the power of machine learning while upholding ethical and responsible lending practices.