Alert button
Picture for Cathal McCague

Cathal McCague

Alert button

on behalf of the AIX-COVNET collaboration

Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation

Sep 20, 2022
Thomas Buddenkotte, Lorena Escudero Sanchez, Mireia Crispin-Ortuzar, Ramona Woitek, Cathal McCague, James D. Brenton, Ozan Öktem, Evis Sala, Leonardo Rundo

Figure 1 for Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation
Figure 2 for Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation
Figure 3 for Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation
Figure 4 for Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation

Uncertainty quantification in automated image analysis is highly desired in many applications. Typically, machine learning models in classification or segmentation are only developed to provide binary answers; however, quantifying the uncertainty of the models can play a critical role for example in active learning or machine human interaction. Uncertainty quantification is especially difficult when using deep learning-based models, which are the state-of-the-art in many imaging applications. The current uncertainty quantification approaches do not scale well in high-dimensional real-world problems. Scalable solutions often rely on classical techniques, such as dropout, during inference or training ensembles of identical models with different random seeds to obtain a posterior distribution. In this paper, we show that these approaches fail to approximate the classification probability. On the contrary, we propose a scalable and intuitive framework to calibrate ensembles of deep learning models to produce uncertainty quantification measurements that approximate the classification probability. On unseen test data, we demonstrate improved calibration, sensitivity (in two out of three cases) and precision when being compared with the standard approaches. We further motivate the usage of our method in active learning, creating pseudo-labels to learn from unlabeled images and human-machine collaboration.

Viaarxiv icon

Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review

Sep 01, 2020
Michael Roberts, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I. Aviles-Rivero, Christian Etmann, Cathal McCague, Lucian Beer, Jonathan R. Weir-McCall, Zhongzhao Teng, James H. F. Rudd, Evis Sala, Carola-Bibiane Schönlieb

Figure 1 for Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review
Figure 2 for Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review
Figure 3 for Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review
Figure 4 for Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review

Background: Machine learning methods offer great potential for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. In this systematic review we critically evaluate the machine learning methodologies employed in the rapidly growing literature. Methods: In this systematic review we reviewed EMBASE via OVID, MEDLINE via PubMed, bioRxiv, medRxiv and arXiv for published papers and preprints uploaded from Jan 1, 2020 to June 24, 2020. Studies which consider machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images were included. A methodology quality review of each paper was performed against established benchmarks to ensure the review focusses only on high-quality reproducible papers. This study is registered with PROSPERO [CRD42020188887]. Interpretation: Our review finds that none of the developed models discussed are of potential clinical use due to methodological flaws and underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. Typically, we find that the documentation of a model's development is not sufficient to make the results reproducible and therefore of 168 candidate papers only 29 are deemed to be reproducible and subsequently considered in this review. We therefore encourage authors to use established machine learning checklists to ensure sufficient documentation is made available, and to follow the PROBAST (prediction model risk of bias assessment tool) framework to determine the underlying biases in their model development process and to mitigate these where possible. This is key to safe clinical implementation which is urgently needed.

* 25 pages, 3 figures, 2 tables 
Viaarxiv icon