Coronary artery calcification (CAC) is a strong and independent predictor of cardiovascular disease (CVD). However, manual assessment of CAC often requires radiological expertise, time, and invasive imaging techniques. The purpose of this multicenter study is to validate an automated cardiac plaque detection model using a 3D multiclass nnU-Net for gated and non-gated non-contrast chest CT volumes. CT scans were performed at three tertiary care hospitals and collected as three datasets, respectively. Heart, aorta, and lung segmentations were determined using TotalSegmentator, while plaques in the coronary arteries and heart valves were manually labeled for 801 volumes. In this work we demonstrate how the nnU-Net semantic segmentation pipeline may be adapted to detect plaques in the coronary arteries and valves. With a linear correction, nnU-Net deep learning methods may also accurately estimate Agatston scores on chest non-contrast CT scans. Compared to manual Agatson scoring, automated Agatston scoring indicated a slope of the linear regression of 0.841 with an intercept of +16 HU (R2 = 0.97). These results are an improvement over previous work assessing automated Agatston score computation in non-gated CT scans.
Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors which have the potential to metastasize. For the management of patients with PPGLs, CT is the preferred modality of choice for precise localization and estimation of their progression. However, due to the myriad variations in size, morphology, and appearance of the tumors in different anatomical regions, radiologists are posed with the challenge of accurate detection of PPGLs. Since clinicians also need to routinely measure their size and track their changes over time across patient visits, manual demarcation of PPGLs is quite a time-consuming and cumbersome process. To ameliorate the manual effort spent for this task, we propose an automated method to detect PPGLs in CT studies via a proxy segmentation task. As only weak annotations for PPGLs in the form of prospectively marked 2D bounding boxes on an axial slice were available, we extended these 2D boxes into weak 3D annotations and trained a 3D full-resolution nnUNet model to directly segment PPGLs. We evaluated our approach on a dataset consisting of chest-abdomen-pelvis CTs of 255 patients with confirmed PPGLs. We obtained a precision of 70% and sensitivity of 64.1% with our proposed approach when tested on 53 CT studies. Our findings highlight the promising nature of detecting PPGLs via segmentation, and furthers the state-of-the-art in this exciting yet challenging area of rare cancer management.
* Accepted at SPIE 2024. arXiv admin note: text overlap with
Multi-parametric MRI of the body is routinely acquired for the identification of abnormalities and diagnosis of diseases. However, a standard naming convention for the MRI protocols and associated sequences does not exist due to wide variations in imaging practice at institutions and myriad MRI scanners from various manufacturers being used for imaging. The intensity distributions of MRI sequences differ widely as a result, and there also exists information conflicts related to the sequence type in the DICOM headers. At present, clinician oversight is necessary to ensure that the correct sequence is being read and used for diagnosis. This poses a challenge when specific series need to be considered for building a cohort for a large clinical study or for developing AI algorithms. In order to reduce clinician oversight and ensure the validity of the DICOM headers, we propose an automated method to classify the 3D MRI sequence acquired at the levels of the chest, abdomen, and pelvis. In our pilot work, our 3D DenseNet-121 model achieved an F1 score of 99.5% at differentiating 5 common MRI sequences obtained by three Siemens scanners (Aera, Verio, Biograph mMR). To the best of our knowledge, we are the first to develop an automated method for the 3D classification of MRI sequences in the chest, abdomen, and pelvis, and our work has outperformed the previous state-of-the-art MRI series classifiers.
The skeletal region is one of the common sites of metastatic spread of cancer in the breast and prostate. CT is routinely used to measure the size of lesions in the bones. However, they can be difficult to spot due to the wide variations in their sizes, shapes, and appearances. Precise localization of such lesions would enable reliable tracking of interval changes (growth, shrinkage, or unchanged status). To that end, an automated technique to detect bone lesions is highly desirable. In this pilot work, we developed a pipeline to detect bone lesions (lytic, blastic, and mixed) in CT volumes via a proxy segmentation task. First, we used the bone lesions that were prospectively marked by radiologists in a few 2D slices of CT volumes and converted them into weak 3D segmentation masks. Then, we trained a 3D full-resolution nnUNet model using these weak 3D annotations to segment the lesions and thereby detected them. Our automated method detected bone lesions in CT with a precision of 96.7% and recall of 47.3% despite the use of incomplete and partial training data. To the best of our knowledge, we are the first to attempt the direct detection of bone lesions in CT via a proxy segmentation task.
Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V outperforms human physicians regarding multi-choice accuracy (88.0% vs. 77.0%, p=0.034). GPT-4V also performs well in cases where physicians incorrectly answer, with over 80% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (27.3%), most prominent in image comprehension (21.6%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such models into clinical workflows.
Purpose: Lymph nodes (LNs) in the chest have a tendency to enlarge due to various pathologies, such as lung cancer or pneumonia. Clinicians routinely measure nodal size to monitor disease progression, confirm metastatic cancer, and assess treatment response. However, variations in their shapes and appearances make it cumbersome to identify LNs, which reside outside of most organs. Methods: We propose to segment LNs in the mediastinum by leveraging the anatomical priors of 28 different structures (e.g., lung, trachea etc.) generated by the public TotalSegmentator tool. The CT volumes from 89 patients available in the public NIH CT Lymph Node dataset were used to train three 3D nnUNet models to segment LNs. The public St. Olavs dataset containing 15 patients (out-of-training-distribution) was used to evaluate the segmentation performance. Results: For the 15 test patients, the 3D cascade nnUNet model obtained the highest Dice score of 72.2 +- 22.3 for mediastinal LNs with short axis diameter $\geq$ 8mm and 54.8 +- 23.8 for all LNs respectively. These results represent an improvement of 10 points over a current approach that was evaluated on the same test dataset. Conclusion: To our knowledge, we are the first to harness 28 distinct anatomical priors to segment mediastinal LNs, and our work can be extended to other nodal zones in the body. The proposed method has immense potential for improved patient outcomes through the identification of enlarged nodes in initial staging CT scans.
Purpose: Body composition measurements from routine abdominal CT can yield personalized risk assessments for asymptomatic and diseased patients. In particular, attenuation and volume measures of muscle and fat are associated with important clinical outcomes, such as cardiovascular events, fractures, and death. This study evaluates the reliability of an Internal tool for the segmentation of muscle and fat (subcutaneous and visceral) as compared to the well-established public TotalSegmentator tool. Methods: We assessed the tools across 900 CT series from the publicly available SAROS dataset, focusing on muscle, subcutaneous fat, and visceral fat. The Dice score was employed to assess accuracy in subcutaneous fat and muscle segmentation. Due to the lack of ground truth segmentations for visceral fat, Cohen's Kappa was utilized to assess segmentation agreement between the tools. Results: Our Internal tool achieved a 3% higher Dice (83.8 vs. 80.8) for subcutaneous fat and a 5% improvement (87.6 vs. 83.2) for muscle segmentation respectively. A Wilcoxon signed-rank test revealed that our results were statistically different with p<0.01. For visceral fat, the Cohen's kappa score of 0.856 indicated near-perfect agreement between the two tools. Our internal tool also showed very strong correlations for muscle volume (R^2=0.99), muscle attenuation (R^2=0.93), and subcutaneous fat volume (R^2=0.99) with a moderate correlation for subcutaneous fat attenuation (R^2=0.45). Conclusion: Our findings indicated that our Internal tool outperformed TotalSegmentator in measuring subcutaneous fat and muscle. The high Cohen's Kappa score for visceral fat suggests a reliable level of agreement between the two tools. These results demonstrate the potential of our tool in advancing the accuracy of body composition analysis.
As a new emerging and promising type of generative models, diffusion models have proven to outperform Generative Adversarial Networks (GANs) in multiple tasks, including image synthesis. In this work, we explore semantic image synthesis for abdominal CT using conditional diffusion models, which can be used for downstream applications such as data augmentation. We systematically evaluated the performance of three diffusion models, as well as to other state-of-the-art GAN-based approaches, and studied the different conditioning scenarios for the semantic mask. Experimental results demonstrated that diffusion models were able to synthesize abdominal CT images with better quality. Additionally, encoding the mask and the input separately is more effective than na\"ive concatenating.
* This paper has been accepted at Deep Generative Models workshop at
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency but it demands a meticulous approach to mitigate potential risks as with any other new technology. Beginning with rigorous pre-deployment evaluation and validation, the focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy for their intended applications. Input and output guardrails implemented during production usage act as an additional layer of protection, identifying and addressing individual failures as they occur. Continuous post-deployment monitoring allows for tracking population-level performance (data drift), fairness, and value delivery over time. Scheduling reviews of post-deployment model performance and educating radiologists about new algorithmic-driven findings is critical for AI to be effective in clinical practice. Recognizing that no single AI solution can provide absolute assurance even when limited to its intended use, the synergistic application of quality assurance at multiple levels - regulatory, clinical, technical, and ethical - is emphasized. Collaborative efforts between stakeholders spanning healthcare systems, industry, academia, and government are imperative to address the multifaceted challenges involved. Trust in AI is an earned privilege, contingent on a broad set of goals, among them transparently demonstrating that the AI adheres to the same rigorous safety, effectiveness and efficacy standards as other established medical technologies. By doing so, developers can instil confidence among providers and patients alike, enabling the responsible scaling of AI and the realization of its potential benefits. The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.