Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrey Fedorov

Department of Radiology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA

MHub.ai: A Simple, Standardized, and Reproducible Platform for AI Models in Medical Imaging

Jan 15, 2026

Leonard Nürnberg, Dennis Bontempi, Suraj Pai, Curtis Lisle, Steve Pieper, Ron Kikinis, Sil van de Leemput, Rahul Soni, Gowtham Murugesan, Cosmin Ciausu(+10 more)

Abstract:Artificial intelligence (AI) has the potential to transform medical imaging by automating image analysis and accelerating clinical research. However, research and clinical use are limited by the wide variety of AI implementations and architectures, inconsistent documentation, and reproducibility issues. Here, we introduce MHub.ai, an open-source, container-based platform that standardizes access to AI models with minimal configuration, promoting accessibility and reproducibility in medical imaging. MHub.ai packages models from peer-reviewed publications into standardized containers that support direct processing of DICOM and other formats, provide a unified application interface, and embed structured metadata. Each model is accompanied by publicly available reference data that can be used to confirm model operation. MHub.ai includes an initial set of state-of-the-art segmentation, prediction, and feature extraction models for different modalities. The modular framework enables adaptation of any model and supports community contributions. We demonstrate the utility of the platform in a clinical use case through comparative evaluation of lung segmentation models. To further strengthen transparency and reproducibility, we publicly release the generated segmentations and evaluation metrics and provide interactive dashboards that allow readers to inspect individual cases and reproduce or extend our analysis. By simplifying model use, MHub.ai enables side-by-side benchmarking with identical execution commands and standardized outputs, and lowers the barrier to clinical translation.

* 41 pages, 15 figures, 6 tables

Via

Access Paper or Ask Questions

In search of truth: Evaluating concordance of AI-based anatomy segmentation models

Dec 17, 2025

Lena Giebeler, Deepa Krishnaswamy, David Clunie, Jakob Wasserthal, Lalith Kumar Shiyam Sundar, Andres Diaz-Pinto, Klaus H. Maier-Hein, Murong Xu, Bjoern Menze, Steve Pieper(+2 more)

Figure 1 for In search of truth: Evaluating concordance of AI-based anatomy segmentation models

Figure 2 for In search of truth: Evaluating concordance of AI-based anatomy segmentation models

Figure 3 for In search of truth: Evaluating concordance of AI-based anatomy segmentation models

Figure 4 for In search of truth: Evaluating concordance of AI-based anatomy segmentation models

Abstract:Purpose AI-based methods for anatomy segmentation can help automate characterization of large imaging datasets. The growing number of similar in functionality models raises the challenge of evaluating them on datasets that do not contain ground truth annotations. We introduce a practical framework to assist in this task. Approach We harmonize the segmentation results into a standard, interoperable representation, which enables consistent, terminology-based labeling of the structures. We extend 3D Slicer to streamline loading and comparison of these harmonized segmentations, and demonstrate how standard representation simplifies review of the results using interactive summary plots and browser-based visualization using OHIF Viewer. To demonstrate the utility of the approach we apply it to evaluating segmentation of 31 anatomical structures (lungs, vertebrae, ribs, and heart) by six open-source models - TotalSegmentator 1.5 and 2.6, Auto3DSeg, MOOSE, MultiTalent, and CADS - for a sample of Computed Tomography (CT) scans from the publicly available National Lung Screening Trial (NLST) dataset. Results We demonstrate the utility of the framework in enabling automating loading, structure-wise inspection and comparison across models. Preliminary results ascertain practical utility of the approach in allowing quick detection and review of problematic results. The comparison shows excellent agreement segmenting some (e.g., lung) but not all structures (e.g., some models produce invalid vertebrae or rib segmentations). Conclusions The resources developed are linked from https://imagingdatacommons.github.io/segmentation-comparison/ including segmentation harmonization scripts, summary plots, and visualization tools. This work assists in model evaluation in absence of ground truth, ultimately enabling informed model selection.

Via

Access Paper or Ask Questions

From Data to Diagnosis: A Large, Comprehensive Bone Marrow Dataset and AI Methods for Childhood Leukemia Prediction

Sep 19, 2025

Henning Höfener, Farina Kock, Martina Pontones, Tabita Ghete, David Pfrang, Nicholas Dickel, Meik Kunz, Daniela P. Schacherer, David A. Clunie, Andrey Fedorov(+2 more)

Abstract:Leukemia diagnosis primarily relies on manual microscopic analysis of bone marrow morphology supported by additional laboratory parameters, making it complex and time consuming. While artificial intelligence (AI) solutions have been proposed, most utilize private datasets and only cover parts of the diagnostic pipeline. Therefore, we present a large, high-quality, publicly available leukemia bone marrow dataset spanning the entire diagnostic process, from cell detection to diagnosis. Using this dataset, we further propose methods for cell detection, cell classification, and diagnosis prediction. The dataset comprises 246 pediatric patients with diagnostic, clinical and laboratory information, over 40 000 cells with bounding box annotations and more than 28 000 of these with high-quality class labels, making it the most comprehensive dataset publicly available. Evaluation of the AI models yielded an average precision of 0.96 for the cell detection, an area under the curve of 0.98, and an F1-score of 0.61 for the 33-class cell classification, and a mean F1-score of 0.90 for the diagnosis prediction using predicted cell counts. While the proposed approaches demonstrate their usefulness for AI-assisted diagnostics, the dataset will foster further research and development in the field, ultimately contributing to more precise diagnoses and improved patient outcomes.

Via

Access Paper or Ask Questions

Benchmarking of Deep Learning Methods for Generic MRI Multi-OrganAbdominal Segmentation

Jul 23, 2025

Deepa Krishnaswamy, Cosmin Ciausu, Steve Pieper, Ron Kikinis, Benjamin Billot, Andrey Fedorov

Abstract:Recent advances in deep learning have led to robust automated tools for segmentation of abdominal computed tomography (CT). Meanwhile, segmentation of magnetic resonance imaging (MRI) is substantially more challenging due to the inherent signal variability and the increased effort required for annotating training datasets. Hence, existing approaches are trained on limited sets of MRI sequences, which might limit their generalizability. To characterize the landscape of MRI abdominal segmentation tools, we present here a comprehensive benchmarking of the three state-of-the-art and open-source models: MRSegmentator, MRISegmentator-Abdomen, and TotalSegmentator MRI. Since these models are trained using labor-intensive manual annotation cycles, we also introduce and evaluate ABDSynth, a SynthSeg-based model purely trained on widely available CT segmentations (no real images). More generally, we assess accuracy and generalizability by leveraging three public datasets (not seen by any of the evaluated methods during their training), which span all major manufacturers, five MRI sequences, as well as a variety of subject conditions, voxel resolutions, and fields-of-view. Our results reveal that MRSegmentator achieves the best performance and is most generalizable. In contrast, ABDSynth yields slightly less accurate results, but its relaxed requirements in training data make it an alternative when the annotation budget is limited. The evaluation code and datasets are given for future benchmarking at https://github.com/deepakri201/AbdoBench, along with inference code and weights for ABDSynth.

Via

Access Paper or Ask Questions

Rule-based outlier detection of AI-generated anatomy segmentations

Jun 20, 2024

Deepa Krishnaswamy, Vamsi Krishna Thiriveedhi, Cosmin Ciausu, David Clunie, Steve Pieper, Ron Kikinis, Andrey Fedorov

Figure 1 for Rule-based outlier detection of AI-generated anatomy segmentations

Figure 2 for Rule-based outlier detection of AI-generated anatomy segmentations

Figure 3 for Rule-based outlier detection of AI-generated anatomy segmentations

Figure 4 for Rule-based outlier detection of AI-generated anatomy segmentations

Abstract:There is a dire need for medical imaging datasets with accompanying annotations to perform downstream patient analysis. However, it is difficult to manually generate these annotations, due to the time-consuming nature, and the variability in clinical conventions. Artificial intelligence has been adopted in the field as a potential method to annotate these large datasets, however, a lack of expert annotations or ground truth can inhibit the adoption of these annotations. We recently made a dataset publicly available including annotations and extracted features of up to 104 organs for the National Lung Screening Trial using the TotalSegmentator method. However, the released dataset does not include expert-derived annotations or an assessment of the accuracy of the segmentations, limiting its usefulness. We propose the development of heuristics to assess the quality of the segmentations, providing methods to measure the consistency of the annotations and a comparison of results to the literature. We make our code and related materials publicly available at https://github.com/ImagingDataCommons/CloudSegmentatorResults and interactive tools at https://huggingface.co/spaces/ImagingDataCommons/CloudSegmentatorResults.

Via

Access Paper or Ask Questions

Automatic classification of prostate MR series type using image content and metadata

Apr 16, 2024

Deepa Krishnaswamy, Bálint Kovács, Stefan Denner, Steve Pieper, David Clunie, Christopher P. Bridge, Tina Kapur, Klaus H. Maier-Hein, Andrey Fedorov

Figure 1 for Automatic classification of prostate MR series type using image content and metadata

Figure 2 for Automatic classification of prostate MR series type using image content and metadata

Figure 3 for Automatic classification of prostate MR series type using image content and metadata

Abstract:With the wealth of medical image data, efficient curation is essential. Assigning the sequence type to magnetic resonance images is necessary for scientific studies and artificial intelligence-based analysis. However, incomplete or missing metadata prevents effective automation. We therefore propose a deep-learning method for classification of prostate cancer scanning sequences based on a combination of image data and DICOM metadata. We demonstrate superior results compared to metadata or image data alone, and make our code publicly available at https://github.com/deepakri201/DICOMScanClassification.

Via

Access Paper or Ask Questions

Towards Automatic Abdominal MRI Organ Segmentation: Leveraging Synthesized Data Generated From CT Labels

Mar 22, 2024

Cosmin Ciausu, Deepa Krishnaswamy, Benjamin Billot, Steve Pieper, Ron Kikinis, Andrey Fedorov

Figure 1 for Towards Automatic Abdominal MRI Organ Segmentation: Leveraging Synthesized Data Generated From CT Labels

Figure 2 for Towards Automatic Abdominal MRI Organ Segmentation: Leveraging Synthesized Data Generated From CT Labels

Figure 3 for Towards Automatic Abdominal MRI Organ Segmentation: Leveraging Synthesized Data Generated From CT Labels

Figure 4 for Towards Automatic Abdominal MRI Organ Segmentation: Leveraging Synthesized Data Generated From CT Labels

Abstract:Deep learning has shown great promise in the ability to automatically annotate organs in magnetic resonance imaging (MRI) scans, for example, of the brain. However, despite advancements in the field, the ability to accurately segment abdominal organs remains difficult across MR. In part, this may be explained by the much greater variability in image appearance and severely limited availability of training labels. The inherent nature of computed tomography (CT) scans makes it easier to annotate, resulting in a larger availability of expert annotations for the latter. We leverage a modality-agnostic domain randomization approach, utilizing CT label maps to generate synthetic images on-the-fly during training, further used to train a U-Net segmentation network for abdominal organs segmentation. Our approach shows comparable results compared to fully-supervised segmentation methods trained on MR data. Our method results in Dice scores of 0.90 (0.08) and 0.91 (0.08) for the right and left kidney respectively, compared to a pretrained nnU-Net model yielding 0.87 (0.20) and 0.91 (0.03). We will make our code publicly available.

* 13 pages

Via

Access Paper or Ask Questions

Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects

Sep 06, 2023

Nikos Chrisochoides, Andrey Fedorov, Yixun Liu, Andriy Kot, Panos Foteinos, Fotis Drakopoulos, Christos Tsolakis, Emmanuel Billias, Olivier Clatz, Nicholas Ayache(+3 more)

Figure 1 for Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects

Figure 2 for Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects

Figure 3 for Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects

Figure 4 for Real-Time Dynamic Data Driven Deformable Registration for Image-Guided Neurosurgery: Computational Aspects

Abstract:Current neurosurgical procedures utilize medical images of various modalities to enable the precise location of tumors and critical brain structures to plan accurate brain tumor resection. The difficulty of using preoperative images during the surgery is caused by the intra-operative deformation of the brain tissue (brain shift), which introduces discrepancies concerning the preoperative configuration. Intra-operative imaging allows tracking such deformations but cannot fully substitute for the quality of the pre-operative data. Dynamic Data Driven Deformable Non-Rigid Registration (D4NRR) is a complex and time-consuming image processing operation that allows the dynamic adjustment of the pre-operative image data to account for intra-operative brain shift during the surgery. This paper summarizes the computational aspects of a specific adaptive numerical approximation method and its variations for registering brain MRIs. It outlines its evolution over the last 15 years and identifies new directions for the computational aspects of the technique.

Via

Access Paper or Ask Questions

Enrichment of the NLST and NSCLC-Radiomics computed tomography collections with AI-derived annotations

May 31, 2023

Deepa Krishnaswamy, Dennis Bontempi, Vamsi Thiriveedhi, Davide Punzo, David Clunie, Christopher P Bridge, Hugo JWL Aerts, Ron Kikinis, Andrey Fedorov

Abstract:Public imaging datasets are critical for the development and evaluation of automated tools in cancer imaging. Unfortunately, many do not include annotations or image-derived features, complicating their downstream analysis. Artificial intelligence-based annotation tools have been shown to achieve acceptable performance and thus can be used to automatically annotate large datasets. As part of the effort to enrich public data available within NCI Imaging Data Commons (IDC), here we introduce AI-generated annotations for two collections of computed tomography images of the chest, NSCLC-Radiomics, and the National Lung Screening Trial. Using publicly available AI algorithms we derived volumetric annotations of thoracic organs at risk, their corresponding radiomics features, and slice-level annotations of anatomical landmarks and regions. The resulting annotations are publicly available within IDC, where the DICOM format is used to harmonize the data and achieve FAIR principles. The annotations are accompanied by cloud-enabled notebooks demonstrating their use. This study reinforces the need for large, publicly accessible curated datasets and demonstrates how AI can be used to aid in cancer imaging.

Via

Access Paper or Ask Questions

The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Mar 16, 2023

Daniela P. Schacherer, Markus D. Herrmann, David A. Clunie, Henning Höfener, William Clifford, William J. R. Longabaugh, Steve Pieper, Ron Kikinis, Andrey Fedorov, André Homeyer

Figure 1 for The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Figure 2 for The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Figure 3 for The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Figure 4 for The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Abstract:Objective: Reproducibility is critical for translating machine learning-based (ML) solutions in computational pathology (CompPath) into practice. However, an increasing number of studies report difficulties in reproducing ML results. The NCI Imaging Data Commons (IDC) is a public repository of >120 cancer image collections, including >38,000 whole-slide images (WSIs), that is designed to be used with cloud-based ML services. Here, we explore the potential of the IDC to facilitate reproducibility of CompPath research. Materials and Methods: The IDC realizes the FAIR principles: All images are encoded according to the DICOM standard, persistently identified, discoverable via rich metadata, and accessible via open tools. Taking advantage of this, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets from the IDC. To assess reproducibility, the experiments were run multiple times with independent but identically configured sessions of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent and in the same order of magnitude as a similar, previously published study. However, there were occasional small variations in AUC values of up to 0.044, indicating a practical limit to reproducibility. Discussion and conclusion: By realizing the FAIR principles, the IDC enables other researchers to reuse exactly the same datasets. Cloud-based ML services enable others to run CompPath experiments in an identically configured computing environment without having to own high-performance hardware. The combination of both makes it possible to approach the reproducibility limit.

Via

Access Paper or Ask Questions