Abstract:Controlling the output probabilities of softmax-based models is a common problem in modern machine learning. Although the $\mathrm{Softmax}$ function provides soft control via its temperature parameter, it lacks the ability to enforce hard constraints, such as box constraints, on output probabilities, which can be critical in certain applications requiring reliable and trustworthy models. In this work, we propose the box-constrained softmax ($\mathrm{BCSoftmax}$) function, a novel generalization of the $\mathrm{Softmax}$ function that explicitly enforces lower and upper bounds on output probabilities. While $\mathrm{BCSoftmax}$ is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient computation algorithm for $\mathrm{BCSoftmax}$. As a key application, we introduce two post-hoc calibration methods based on $\mathrm{BCSoftmax}$. The proposed methods mitigate underconfidence and overconfidence in predictive models by learning the lower and upper bounds of the output probabilities or logits after model training, thereby enhancing reliability in downstream decision-making tasks. We demonstrate the effectiveness of our methods experimentally using the TinyImageNet, CIFAR-100, and 20NewsGroups datasets, achieving improvements in calibration metrics.
Abstract:In scene graph generation (SGG), learning with cross-entropy loss yields biased predictions owing to the severe imbalance in the distribution of the relationship labels in the dataset. Thus, this study proposes a method to generate scene graphs using optimal transport as a measure for comparing two probability distributions. We apply learning with the optimal transport loss, which reflects the similarity between the labels in terms of transportation cost, for predicate classification in SGG. In the proposed approach, the transportation cost of the optimal transport is defined using the similarity of words obtained from the pre-trained model. The experimental evaluation of the effectiveness demonstrates that the proposed method outperforms existing methods in terms of mean Recall@50 and 100. Furthermore, it improves the recall of the relationship labels scarcely available in the dataset.
Abstract:Factorization machines (FMs) are machine learning predictive models based on second-order feature interactions and FMs with sparse regularization are called sparse FMs. Such regularizations enable feature selection, which selects the most relevant features for accurate prediction, and therefore they can contribute to the improvement of the model accuracy and interpretability. However, because FMs use second-order feature interactions, the selection of features often cause the loss of many relevant feature interactions in the resultant models. In such cases, FMs with regularization specially designed for feature interaction selection trying to achieve interaction-level sparsity may be preferred instead of those just for feature selection trying to achieve feature-level sparsity. In this paper, we present a new regularization scheme for feature interaction selection in FMs. The proposed regularizer is an upper bound of the $\ell_1$ regularizer for the feature interaction matrix, which is computed from the parameter matrix of FMs. For feature interaction selection, our proposed regularizer makes the feature interaction matrix sparse without a restriction on sparsity patterns imposed by the existing methods. We also describe efficient proximal algorithms for the proposed FMs and present theoretical analyses of both existing and the new regularize. In addition, we will discuss how our ideas can be applied or extended to more accurate feature selection and other related models such as higher-order FMs and the all-subsets model. The analysis and experimental results on synthetic and real-world datasets show the effectiveness of the proposed methods.