This paper addresses the problem of designing reliable prediction models that abstain from predictions when faced with uncertain or out-of-distribution samples - a recently proposed problem known as Selective Classification in the presence of Out-of-Distribution data (SCOD). We make three key contributions to SCOD. Firstly, we demonstrate that the optimal SCOD strategy involves a Bayes classifier for in-distribution (ID) data and a selector represented as a stochastic linear classifier in a 2D space, using i) the conditional risk of the ID classifier, and ii) the likelihood ratio of ID and out-of-distribution (OOD) data as input. This contrasts with suboptimal strategies from current OOD detection methods and the Softmax Information Retaining Combination (SIRC), specifically developed for SCOD. Secondly, we establish that in a distribution-free setting, the SCOD problem is not Probably Approximately Correct learnable when relying solely on an ID data sample. Third, we introduce POSCOD, a simple method for learning a plugin estimate of the optimal SCOD strategy from both an ID data sample and an unlabeled mixture of ID and OOD data. Our empirical results confirm the theoretical findings and demonstrate that our proposed method, POSCOD, out performs existing OOD methods in effectively addressing the SCOD problem.
The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches.
Comparing different age estimation methods poses a challenge due to the unreliability of published results, stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. We argue that, for age estimation tasks outside of the low-data regime, designing specialized methods is unnecessary, and the standard approach of utilizing cross-entropy loss is sufficient. This paper aims to address the benchmark shortcomings by evaluating state-of-the-art age estimation methods in a unified and comparable setting. We systematically analyze the impact of various factors, including facial alignment, facial coverage, image resolution, image representation, model architecture, and the amount of data on age estimation results. Surprisingly, these factors often exert a more significant influence than the choice of the age estimation method itself. We assess the generalization capability of each method by evaluating the cross-dataset performance for publicly available age estimation datasets. The results emphasize the importance of using consistent data preprocessing practices and establishing standardized benchmarks to ensure reliable and meaningful comparisons. The source code is available at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.