Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrea Prenner

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

May 13, 2025

Meritxell Riera-Marin, Sikha O K, Julia Rodriguez-Comas, Matthias Stefan May, Zhaohong Pan, Xiang Zhou, Xiaokun Liang, Franciskus Xaverius Erick, Andrea Prenner, Cedric Hemon(+22 more)

Figure 1 for Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Figure 2 for Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Figure 3 for Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Figure 4 for Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Abstract:Deep learning (DL) has become the dominant approach for medical image segmentation, yet ensuring the reliability and clinical applicability of these models requires addressing key challenges such as annotation variability, calibration, and uncertainty estimation. This is why we created the Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS), which highlights the critical role of multiple annotators in establishing a more comprehensive ground truth, emphasizing that segmentation is inherently subjective and that leveraging inter-annotator variability is essential for robust model evaluation. Seven teams participated in the challenge, submitting a variety of DL models evaluated using metrics such as Dice Similarity Coefficient (DSC), Expected Calibration Error (ECE), and Continuous Ranked Probability Score (CRPS). By incorporating consensus and dissensus ground truth, we assess how DL models handle uncertainty and whether their confidence estimates align with true segmentation performance. Our findings reinforce the importance of well-calibrated models, as better calibration is strongly correlated with the quality of the results. Furthermore, we demonstrate that segmentation models trained on diverse datasets and enriched with pre-trained knowledge exhibit greater robustness, particularly in cases deviating from standard anatomical structures. Notably, the best-performing models achieved high DSC and well-calibrated uncertainty estimates. This work underscores the need for multi-annotator ground truth, thorough calibration assessments, and uncertainty-aware evaluations to develop trustworthy and clinically reliable DL-based medical image segmentation models.

* This challenge was hosted in MICCAI 2024

Via

Access Paper or Ask Questions

Prediction of Cardiovascular Risk Factors from Retinal Fundus Images using CNNs

Oct 15, 2024

Andrea Prenner

Figure 1 for Prediction of Cardiovascular Risk Factors from Retinal Fundus Images using CNNs

Figure 2 for Prediction of Cardiovascular Risk Factors from Retinal Fundus Images using CNNs

Figure 3 for Prediction of Cardiovascular Risk Factors from Retinal Fundus Images using CNNs

Figure 4 for Prediction of Cardiovascular Risk Factors from Retinal Fundus Images using CNNs

Abstract:Early detection of cardiovascular disease risk factors is essential to alter the course of the disease. Previous studies showed that deep learning can successfully be used to detect such risk factors from retinal images. This study uses convolutional neural networks (CNNs) to predict the cardiovascular disease risk factors age, BMI, smoking status, HbA1c, systolic blood pressure, diastolic blood pressure, gender and total cholesterol from retinal images from the UK Biobank data set. By applying contrast enhancement on the retinal images in the form of Gaussian filtering and deriving predictions on individual basis through the combination of left and right retinal image predictions, an increased prediction performance could be derived for the variables age (R2 score of 0.81) and systolic blood pressure (R2 score of 0.39) compared to previous studies using retinal images from the UK Biobank data set. Further, this is the first study that tries to predict HbA1c and total cholesterol from UK Biobank retinal fundus images. For these variables the models achieved an R2 score of 0.0579 for predicting HbA1c and an R2 score of 0.0157 for predicting total cholesterol. These results show that the value of deriving predictions for these two risk factors from retinal fundus images from the UK Biobank data set is limited.

Via

Access Paper or Ask Questions

Bias Assessment and Data Drift Detection in Medical Image Analysis: A Survey

Sep 26, 2024

Andrea Prenner, Bernhard Kainz

Abstract:Machine Learning (ML) models have gained popularity in medical imaging analysis given their expert level performance in many medical domains. To enhance the trustworthiness, acceptance, and regulatory compliance of medical imaging models and to facilitate their integration into clinical settings, we review and categorise methods for ensuring ML reliability, both during development and throughout the model's lifespan. Specifically, we provide an overview of methods assessing models' inner-workings regarding bias encoding and detection of data drift for disease classification models. Additionally, to evaluate the severity in case of a significant drift, we provide an overview of the methods developed for classifier accuracy estimation in case of no access to ground truth labels. This should enable practitioners to implement methods ensuring reliable ML deployment and consistent prediction performance over time.

Via

Access Paper or Ask Questions