Algorithmic recourse recommends a cost-efficient action to a subject to reverse an unfavorable machine learning classification decision. Most existing methods in the literature generate recourse under the assumption of complete knowledge about the cost function. In real-world practice, subjects could have distinct preferences, leading to incomplete information about the underlying cost function of the subject. This paper proposes a two-step approach integrating preference learning into the recourse generation problem. In the first step, we design a question-answering framework to refine the confidence set of the Mahalanobis matrix cost of the subject sequentially. Then, we generate recourse by utilizing two methods: gradient-based and graph-based cost-adaptive recourse that ensures validity while considering the whole confidence set of the cost matrix. The numerical evaluation demonstrates the benefits of our approach over state-of-the-art baselines in delivering cost-efficient recourse recommendations.
Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework.
We introduce Dynamic Tiling, a model-agnostic, adaptive, and scalable approach for small object detection, anchored in our inference-data-centric philosophy. Dynamic Tiling starts with non-overlapping tiles for initial detections and utilizes dynamic overlapping rates along with a tile minimizer. This dual approach effectively resolves fragmented objects, improves detection accuracy, and minimizes computational overhead by reducing the number of forward passes through the object detection model. Adaptable to a variety of operational environments, our method negates the need for laborious recalibration. Additionally, our large-small filtering mechanism boosts the detection quality across a range of object sizes. Overall, Dynamic Tiling outperforms existing model-agnostic uniform cropping methods, setting new benchmarks for efficiency and accuracy.
Explaining algorithmic decisions and recommending actionable feedback is increasingly important for machine learning applications. Recently, significant efforts have been invested in finding a diverse set of recourses to cover the wide spectrum of users' preferences. However, existing works often neglect the requirement that the recourses should be close to the data manifold; hence, the constructed recourses might be implausible and unsatisfying to users. To address these issues, we propose a novel approach that explicitly directs the diverse set of actionable recourses towards the data manifold. We first find a diverse set of prototypes in the favorable class that balances the trade-off between diversity and proximity. We demonstrate two specific methods to find these prototypes: either by finding the maximum a posteriori estimate of a determinantal point process or by solving a quadratic binary program. To ensure the actionability constraints, we construct an actionability graph in which the nodes represent the training samples and the edges indicate the feasible action between two instances. We then find a feasible path to each prototype, and this path demonstrates the feasible actions for each recourse in the plan. The experimental results show that our method produces a set of recourses that are close to the data manifold while delivering a better cost-diversity trade-off than existing approaches.
A recourse action aims to explain a particular algorithmic decision by showing one specific way in which the instance could be modified to receive an alternate outcome. Existing recourse generation methods often assume that the machine learning model does not change over time. However, this assumption does not always hold in practice because of data distribution shifts, and in this case, the recourse action may become invalid. To redress this shortcoming, we propose the Distributionally Robust Recourse Action (DiRRAc) framework, which generates a recourse action that has a high probability of being valid under a mixture of model shifts. We formulate the robustified recourse setup as a min-max optimization problem, where the max problem is specified by Gelbrich distance over an ambiguity set around the distribution of model parameters. Then we suggest a projected gradient descent algorithm to find a robust recourse according to the min-max objective. We show that our DiRRAc framework can be extended to hedge against the misspecification of the mixture weights. Numerical experiments with both synthetic and three real-world datasets demonstrate the benefits of our proposed framework over state-of-the-art recourse methods.
Algorithmic recourse aims to recommend an informative feedback to overturn an unfavorable machine learning decision. We introduce in this paper the Bayesian recourse, a model-agnostic recourse that minimizes the posterior probability odds ratio. Further, we present its min-max robust counterpart with the goal of hedging against future changes in the machine learning model parameters. The robust counterpart explicitly takes into account possible perturbations of the data in a Gaussian mixture ambiguity set prescribed using the optimal transport (Wasserstein) distance. We show that the resulting worst-case objective function can be decomposed into solving a series of two-dimensional optimization subproblems, and the min-max recourse finding problem is thus amenable to a gradient descent algorithm. Contrary to existing methods for generating robust recourses, the robust Bayesian recourse does not require a linear approximation step. The numerical experiment demonstrates the effectiveness of our proposed robust Bayesian recourse facing model shifts. Our code is available at https://github.com/VinAIResearch/robust-bayesian-recourse.
Counterfactual explanations are attracting significant attention due to the flourishing applications of machine learning models in consequential domains. A counterfactual plan consists of multiple possibilities to modify a given instance so that the model's prediction will be altered. As the predictive model can be updated subject to the future arrival of new data, a counterfactual plan may become ineffective or infeasible with respect to the future values of the model parameters. In this work, we study the counterfactual plans under model uncertainty, in which the distribution of the model parameters is partially prescribed using only the first- and second-moment information. First, we propose an uncertainty quantification tool to compute the lower and upper bounds of the probability of validity for any given counterfactual plan. We then provide corrective methods to adjust the counterfactual plan to improve the validity measure. The numerical experiments validate our bounds and demonstrate that our correction increases the robustness of the counterfactual plans in different real-world datasets.
In this paper, we study the learning rate of generalized Bayes estimators in a general setting where the hypothesis class can be uncountable and have an irregular shape, the loss function can have heavy tails, and the optimal hypothesis may not be unique. We prove that under the multi-scale Bernstein's condition, the generalized posterior distribution concentrates around the set of optimal hypotheses and the generalized Bayes estimator can achieve fast learning rate. Our results are applied to show that the standard Bayesian linear regression is robust to heavy-tailed distributions.
This paper contributes a new high-quality dataset for hand gesture recognition in hand hygiene systems, named "MFH". Generally, current datasets are not focused on: (i) fine-grained actions; and (ii) data mismatch between different viewpoints, which are available under realistic settings. To address the aforementioned issues, the MFH dataset is proposed to contain a total of 731147 samples obtained by different camera views in 6 non-overlapping locations. Additionally, each sample belongs to one of seven steps introduced by the World Health Organization (WHO). As a minor contribution, inspired by advances in fine-grained image recognition and distribution adaptation, this paper recommends using the self-supervised learning method to handle these preceding problems. The extensive experiments on the benchmarking MFH dataset show that the introduced method yields competitive performance in both the Accuracy and the Macro F1-score. The code and the MFH dataset are available at https://github.com/willogy-team/hand-gesture-recognition-smc2021.
For several purposes in Natural Language Processing (NLP), such as Information Extraction, Sentiment Analysis or Chatbot, Named Entity Recognition (NER) holds an important role as it helps to determine and categorize entities in text into predefined groups such as the names of persons, locations, quantities, organizations or percentages, etc. In this report, we present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. Besides, we also employ a fusion input of embedding vectors (Glove, BERT), which are pre-trained on the huge corpus to boost the generalization capacity of the model. Unfortunately, due to the heavy unbalanced distribution cross-training data, both approaches just attained a bad performance on less training samples classes. To overcome this challenge, we introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set. We evaluated our models on the test set and discovered that our method can improve performance for Weak classes significantly by using a very small data set (approximately 0.45\%) compared to the rest classes.