Abstract:Designing objective functions robust to label noise is crucial for real-world classification algorithms. In this paper, we investigate the robustness to label noise of an $f$-divergence-based class of objective functions recently proposed for supervised classification, herein referred to as $f$-PML. We show that, in the presence of label noise, any of the $f$-PML objective functions can be corrected to obtain a neural network that is equal to the one learned with the clean dataset. Additionally, we propose an alternative and novel correction approach that, during the test phase, refines the posterior estimated by the neural network trained in the presence of label noise. Then, we demonstrate that, even if the considered $f$-PML objective functions are not symmetric, they are robust to symmetric label noise for any choice of $f$-divergence, without the need for any correction approach. This allows us to prove that the cross-entropy, which belongs to the $f$-PML class, is robust to symmetric label noise. Finally, we show that such a class of objective functions can be used together with refined training strategies, achieving competitive performance against state-of-the-art techniques of classification with label noise.
Abstract:In deep learning, classification tasks are formalized as optimization problems solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the $f$-divergence measure to generalize the formulation of the optimization problem for classification. With this goal in mind, we adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence, from which we extract a list of five posterior probability estimators leveraging well-known $f$-divergences. In addition, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of a new objective function (and posterior probability estimator) corresponding to a novel $f$-divergence referred to as shifted log (SL). First, we theoretically prove the convergence property of the posterior probability estimators. Then, we numerically test the set of proposed objective functions in three application scenarios: toy examples, image data sets, and signal detection/decoding problems. The analyzed tasks demonstrate the effectiveness of the proposed estimators and that the SL divergence achieves the highest classification accuracy in almost all the scenarios.
Abstract:The accurate estimation of the mutual information is a crucial task in various applications, including machine learning, communications, and biology, since it enables the understanding of complex systems. High-dimensional data render the task extremely challenging due to the amount of data to be processed and the presence of convoluted patterns. Neural estimators based on variational lower bounds of the mutual information have gained attention in recent years but they are prone to either high bias or high variance as a consequence of the partition function. We propose a novel class of discriminative mutual information estimators based on the variational representation of the $f$-divergence. We investigate the impact of the permutation function used to obtain the marginal training samples and present a novel architectural solution based on derangements. The proposed estimator is flexible as it exhibits an excellent bias/variance trade-off. Experiments on reference scenarios demonstrate that our approach outperforms state-of-the-art neural estimators both in terms of accuracy and complexity.