The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. The effectiveness of our method has been comprehensively evaluated on four medical image segmentation tasks, by using 10 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.
Participant recruitment based on unstructured medical texts such as clinical notes and radiology reports has been a challenging yet important task for the cohort establishment in clinical research. Recently, Large Language Models (LLMs) such as ChatGPT have achieved tremendous success in various downstream tasks thanks to their promising performance in language understanding, inference, and generation. It is then natural to test their feasibility in solving the cohort recruitment task, which involves the classification of a given paragraph of medical text into disease label(s). However, when applied to knowledge-intensive problem settings such as medical text classification, where the LLMs are expected to understand the decision made by human experts and accurately identify the implied disease labels, the LLMs show a mediocre performance. A possible explanation is that, by only using the medical text, the LLMs neglect to use the rich context of additional information that languages afford. To this end, we propose to use a knowledge graph as auxiliary information to guide the LLMs in making predictions. Moreover, to further boost the LLMs adapt to the problem setting, we apply a chain-of-thought (CoT) sample selection strategy enhanced by reinforcement learning, which selects a set of CoT samples given each individual medical report. Experimental results and various ablation studies show that our few-shot learning method achieves satisfactory performance compared with fine-tuning strategies and gains superb advantages when the available data is limited. The code and sample dataset of the proposed CohortGPT model is available at: https://anonymous.4open.science/r/CohortGPT-4872/
We proved that a trained model in supervised deep learning minimizes the conditional risk for each input (Theorem 2.1). This property provided insights into the behavior of trained models and established a connection between supervised and unsupervised learning in some cases. In addition, when the labels are intractable but can be written as a conditional risk minimizer, we proved an equivalent form of the original supervised learning problem with accessible labels (Theorem 2.2). We demonstrated that many existing works, such as Noise2Score, Noise2Noise and score function estimation can be explained by our theorem. Moreover, we derived a property of classification problem with noisy labels using Theorem 2.1 and validated it using MNIST dataset. Furthermore, We proposed a method to estimate uncertainty in image super-resolution based on Theorem 2.2 and validated it using ImageNet dataset. Our code is available on github.
COVID-19 patient triaging with predictive outcome of the patients upon first present to emergency department (ED) is crucial for improving patient prognosis, as well as better hospital resources management and cross-infection control. We trained a deep feature fusion model to predict patient outcomes, where the model inputs were EHR data including demographic information, co-morbidities, vital signs and laboratory measurements, plus patient's CXR images. The model output was patient outcomes defined as the most insensitive oxygen therapy required. For patients without CXR images, we employed Random Forest method for the prediction. Predictive risk scores for COVID-19 severe outcomes ("CO-RISK" score) were derived from model output and evaluated on the testing dataset, as well as compared to human performance. The study's dataset (the "MGB COVID Cohort") was constructed from all patients presenting to the Mass General Brigham (MGB) healthcare system from March 1st to June 1st, 2020. ED visits with incomplete or erroneous data were excluded. Patients with no test order for COVID or confirmed negative test results were excluded. Patients under the age of 15 were also excluded. Finally, electronic health record (EHR) data from a total of 11060 COVID-19 confirmed or suspected patients were used in this study. Chest X-ray (CXR) images were also collected from each patient if available. Results show that CO-RISK score achieved area under the Curve (AUC) of predicting MV/death (i.e. severe outcomes) in 24 hours of 0.95, and 0.92 in 72 hours on the testing dataset. The model shows superior performance to the commonly used risk scores in ED (CURB-65 and MEWS). Comparing with physician's decisions, CO-RISK score has demonstrated superior performance to human in making ICU/floor decisions.
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States. Considering the mixed and unspecific signals in CXR, an image retrieval model of CXR that provides both similar images and associated clinical information can be more clinically meaningful than a direct image diagnostic model. In this work we develop a novel CXR image retrieval model based on deep metric learning. Unlike traditional diagnostic models which aims at learning the direct mapping from images to labels, the proposed model aims at learning the optimized embedding space of images, where images with the same labels and similar contents are pulled together. It utilizes multi-similarity loss with hard-mining sampling strategy and attention mechanism to learn the optimized embedding space, and provides similar images to the query image. The model is trained and validated on an international multi-site COVID-19 dataset collected from 3 different sources. Experimental results of COVID-19 image retrieval and diagnosis tasks show that the proposed model can serve as a robust solution for CXR analysis and patient management for COVID-19. The model is also tested on its transferability on a different clinical decision support task, where the pre-trained model is applied to extract image features from a new dataset without any further training. These results demonstrate our deep metric learning based image retrieval model is highly efficient in the CXR retrieval, diagnosis and prognosis, and thus has great clinical value for the treatment and management of COVID-19 patients.
Purpose. Imaging plays an important role in assessing severity of COVID 19 pneumonia. However, semantic interpretation of chest radiography (CXR) findings does not include quantitative description of radiographic opacities. Most current AI assisted CXR image analysis framework do not quantify for regional variations of disease. To address these, we proposed a four region lung segmentation method to assist accurate quantification of COVID 19 pneumonia. Methods. A segmentation model to separate left and right lung is firstly applied, and then a carina and left hilum detection network is used, which are the clinical landmarks to separate the upper and lower lungs. To improve the segmentation performance of COVID 19 images, ensemble strategy incorporating five models is exploited. Using each region, we evaluated the clinical relevance of the proposed method with the Radiographic Assessment of the Quality of Lung Edema (RALE). Results. The proposed ensemble strategy showed dice score of 0.900, which is significantly higher than conventional methods (0.854 0.889). Mean intensities of segmented four regions indicate positive correlation to the extent and density scores of pulmonary opacities under the RALE framework. Conclusion. A deep learning based model in CXR can accurately segment and quantify regional distribution of pulmonary opacities in patients with COVID 19 pneumonia.
Computed Tomography (CT) takes X-ray measurements on the subjects to reconstruct tomographic images. As X-ray is radioactive, it is desirable to control the total amount of dose of X-ray for safety concerns. Therefore, we can only select a limited number of measurement angles and assign each of them limited amount of dose. Traditional methods such as compressed sensing usually randomly select the angles and equally distribute the allowed dose on them. In most CT reconstruction models, the emphasize is on designing effective image representations, while much less emphasize is on improving the scanning strategy. The simple scanning strategy of random angle selection and equal dose distribution performs well in general, but they may not be ideal for each individual subject. It is more desirable to design a personalized scanning strategy for each subject to obtain better reconstruction result. In this paper, we propose to use Reinforcement Learning (RL) to learn a personalized scanning policy to select the angles and the dose at each chosen angle for each individual subject. We first formulate the CT scanning process as an MDP, and then use modern deep RL methods to solve it. The learned personalized scanning strategy not only leads to better reconstruction results, but also shows strong generalization to be combined with different reconstruction algorithms.
Detecting cerebral aneurysms is an important clinical task of brain computed tomography angiography (CTA). However, human interpretation could be time consuming due to the small size of some aneurysms. In this work, we proposed DeepBrain, a deep learning based cerebral aneurysm detection and localization algorithm. The algorithm consisted of a 3D faster region-proposal convolution neural network for aneurysm detection and localization, and a 3D multi-scale fully convolutional neural network for false positive reduction. Furthermore, a novel hierarchical non-maximum suppression algorithm was proposed to process the detection results in 3D, which greatly reduced the time complexity by eliminating unnecessary comparisons. DeepBrain was trained and tested on 550 brain CTA scans and achieved sensitivity of 93.3% with 0.3 false positives per patient on average.
Dynamic computed tomography perfusion (CTP) imaging is a promising approach for acute ischemic stroke diagnosis and evaluation. Hemodynamic parametric maps of cerebral parenchyma are calculated from repeated CT scans of the first pass of iodinated contrast through the brain. It is necessary to reduce the dose of CTP for routine applications due to the high radiation exposure from the repeated scans, where image denoising is necessary to achieve a reliable diagnosis. In this paper, we proposed a self-supervised deep learning method for CTP denoising, which did not require any high-dose reference images for training. The network was trained by mapping each frame of CTP to an estimation from its adjacent frames. Because the noise in the source and target was independent, this approach could effectively remove the noise. Being free from high-dose training images granted the proposed method easier adaptation to different scanning protocols. The method was validated on both simulation and a public real dataset. The proposed method achieved improved image quality compared to conventional denoising methods. On the real data, the proposed method also had improved spatial resolution and contrast-to-noise ratio compared to supervised learning which was trained on the simulation data