Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Zifu Wang, Teodora Popordanoska, Jeroen Bertels, Robin Lemmens, Matthew B. Blaschko

The soft Dice loss (SDL) has taken a pivotal role in many automated segmentation pipelines in the medical imaging community. Over the last years, some reasons behind its superior functioning have been uncovered and further optimizations have been explored. However, there is currently no implementation that supports its direct use in settings with soft labels. Hence, a synergy between the use of SDL and research leveraging the use of soft labels, also in the context of model calibration, is still missing. In this work, we introduce Dice semimetric losses (DMLs), which (i) are by design identical to SDL in a standard setting with hard labels, but (ii) can be used in settings with soft labels. Our experiments on the public QUBIQ, LiTS and KiTS benchmarks confirm the potential synergy of DMLs with soft labels (e.g. averaging, label smoothing, and knowledge distillation) over hard labels (e.g. majority voting and random selection). As a result, we obtain superior Dice scores and model calibration, which supports the wider adoption of DMLs in practice. Code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

Via

Teodora Popordanoska, Raphael Sayer, Matthew B. Blaschko

Calibrated probabilistic classifiers are models whose predicted probabilities can directly be interpreted as uncertainty estimates. It has been shown recently that deep neural networks are poorly calibrated and tend to output overconfident predictions. As a remedy, we propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates, which asymptotically converges to the true $L_p$ calibration error. This novel estimator enables us to tackle the strongest notion of multiclass calibration, called canonical (or distribution) calibration, while other common calibration methods are tractable only for top-label and marginal calibration. The computational complexity of our estimator is $\mathcal{O}(n^2)$, the convergence rate is $\mathcal{O}(n^{-1/2})$, and it is unbiased up to $\mathcal{O}(n^{-2})$, achieved by a geometric series debiasing scheme. In practice, this means that the estimator can be applied to small subsets of data, enabling efficient estimation and mini-batch updates. The proposed method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities based on conditional expectation, such as the sharpness of a probabilistic classifier. Empirical results validate the correctness of our estimator, and demonstrate its utility in canonical calibration error estimation and calibration error regularized risk minimization.

Via

Teodora Popordanoska, Aleksei Tiulpin, Wacha Bounliphone, Matthew B. Blaschko

The eigendecomposition of a matrix is the central procedure in probabilistic models based on matrix factorization, for instance principal component analysis and topic models. Quantifying the uncertainty of such a decomposition based on a finite sample estimate is essential to reasoning under uncertainty when employing such models. This paper tackles the challenge of computing confidence bounds on the individual entries of eigenvectors of a covariance matrix of fixed dimension. Moreover, we derive a method to bound the entries of the inverse covariance matrix, the so-called precision matrix. The assumptions behind our method are minimal and require that the covariance matrix exists, and its empirical estimator converges to the true covariance. We make use of the theory of U-statistics to bound the $L_2$ perturbation of the empirical covariance matrix. From this result, we obtain bounds on the eigenvectors using Weyl's theorem and the eigenvalue-eigenvector identity and we derive confidence intervals on the entries of the precision matrix using matrix inversion perturbation bounds. As an application of these results, we demonstrate a new statistical test, which allows us to test for non-zero values of the precision matrix. We compare this test to the well-known Fisher-z test for partial correlations, and demonstrate the soundness and scalability of the proposed statistical test, as well as its application to real-world data from medical and physics domains.

Via

Teodora Popordanoska, Jeroen Bertels, Dirk Vandermeulen, Frederik Maes, Matthew B. Blaschko

Machine learning driven medical image segmentation has become standard in medical image analysis. However, deep learning models are prone to overconfident predictions. This has led to a renewed focus on calibrated predictions in the medical imaging and broader machine learning communities. Calibrated predictions are estimates of the probability of a label that correspond to the true expected value of the label conditioned on the confidence. Such calibrated predictions have utility in a range of medical imaging applications, including surgical planning under uncertainty and active learning systems. At the same time it is often an accurate volume measurement that is of real importance for many medical applications. This work investigates the relationship between model calibration and volume estimation. We demonstrate both mathematically and empirically that if the predictor is calibrated per image, we can obtain the correct volume by taking an expectation of the probability scores per pixel/voxel of the image. Furthermore, we show that convex combinations of calibrated classifiers preserve volume estimation, but do not preserve calibration. Therefore, we conclude that having a calibrated predictor is a sufficient, but not necessary condition for obtaining an unbiased estimate of the volume. We validate our theoretical findings empirically on a collection of 18 different (calibrated) training strategies on the tasks of glioma volume estimation on BraTS 2018, and ischemic stroke lesion volume estimation on ISLES 2018 datasets.

Via

Teodora Popordanoska, Mohit Kumar, Stefano Teso

We introduce explanatory guided learning (XGL), a novel interactive learning strategy in which a machine guides a human supervisor toward selecting informative examples for a classifier. The guidance is provided by means of global explanations, which summarize the classifier's behavior on different regions of the instance space and expose its flaws. Compared to other explanatory interactive learning strategies, which are machine-initiated and rely on local explanations, XGL is designed to be robust against cases in which the explanations supplied by the machine oversell the classifier's quality. Moreover, XGL leverages global explanations to open up the black-box of human-initiated interaction, enabling supervisors to select informative examples that challenge the learned model. By drawing a link to interactive machine teaching, we show theoretically that global explanations are a viable approach for guiding supervisors. Our simulations show that explanatory guided learning avoids overselling the model's quality and performs comparably or better than machine- and human-initiated interactive learning strategies in terms of model quality.

Via

Teodora Popordanoska, Mohit Kumar, Stefano Teso

Recent work has demonstrated the promise of combining local explanations with active learning for understanding and supervising black-box models. Here we show that, under specific conditions, these algorithms may misrepresent the quality of the model being learned. The reason is that the machine illustrates its beliefs by predicting and explaining the labels of the query instances: if the machine is unaware of its own mistakes, it may end up choosing queries on which it performs artificially well. This biases the "narrative" presented by the machine to the user.We address this narrative bias by introducing explanatory guided learning, a novel interactive learning strategy in which: i) the supervisor is in charge of choosing the query instances, while ii) the machine uses global explanations to illustrate its overall behavior and to guide the supervisor toward choosing challenging, informative instances. This strategy retains the key advantages of explanatory interaction while avoiding narrative bias and compares favorably to active learning in terms of sample complexity. An initial empirical evaluation with a clustering-based prototype highlights the promise of our approach.

Via