Abstract:Research about bias in machine learning has mostly focused on outcome-oriented fairness metrics (e.g., equalized odds) and on a single protected category. Although these approaches offer great insight into bias in ML, they provide limited insight into model procedure bias. To address this gap, we proposed multi-category explanation stability disparity (MESD), an intersectional, procedurally oriented metric that measures the disparity in the quality of explanations across intersectional subgroups in multiple protected categories. MESD serves as a complementary metric to outcome-oriented metrics, providing detailed insight into the procedure of a model. To further extend the scope of the holistic selection model, we also propose a multi-objective optimization framework, UEF (Utility-Explanation-Fairness), that jointly optimizes three objectives. Experimental results across multiple datasets show that UEF effectively balances objectives. Also, the results show that MESD can effectively capture the explanation difference between intersectional groups. This research addresses an important gap by examining explainability with respect to fairness across multiple protected categories.
Abstract:Fairness in machine learning research has largely focused on outcome-oriented fairness criteria such as Equalized Odds, while comparatively less attention has been given to procedural-oriented fairness, which addresses how a model arrives at its predictions. Neglecting procedural fairness means it is possible for a model to generate different explanations for different protected groups, thereby eroding trust. In this work, we introduce Group Counterfactual Integrated Gradients (GCIG), an in-processing regularization framework that enforces explanation invariance across groups, conditioned on the true label. For each input, GCIG computes explanations relative to multiple Group Conditional baselines and penalizes cross-group variation in these attributions during training. GCIG formalizes procedural fairness as Group Counterfactual explanation stability and complements existing fairness objectives that constrain predictions alone. We compared GCIG empirically against six state-of-the-art methods, and the results show that GCIG substantially reduces cross-group explanation disparity while maintaining competitive predictive performance and accuracy-fairness trade-offs. Our results also show that aligning model reasoning across groups offers a principled and practical avenue for advancing fairness beyond outcome parity.