Participant recruitment based on unstructured medical texts such as clinical notes and radiology reports has been a challenging yet important task for the cohort establishment in clinical research. Recently, Large Language Models (LLMs) such as ChatGPT have achieved tremendous success in various downstream tasks thanks to their promising performance in language understanding, inference, and generation. It is then natural to test their feasibility in solving the cohort recruitment task, which involves the classification of a given paragraph of medical text into disease label(s). However, when applied to knowledge-intensive problem settings such as medical text classification, where the LLMs are expected to understand the decision made by human experts and accurately identify the implied disease labels, the LLMs show a mediocre performance. A possible explanation is that, by only using the medical text, the LLMs neglect to use the rich context of additional information that languages afford. To this end, we propose to use a knowledge graph as auxiliary information to guide the LLMs in making predictions. Moreover, to further boost the LLMs adapt to the problem setting, we apply a chain-of-thought (CoT) sample selection strategy enhanced by reinforcement learning, which selects a set of CoT samples given each individual medical report. Experimental results and various ablation studies show that our few-shot learning method achieves satisfactory performance compared with fine-tuning strategies and gains superb advantages when the available data is limited. The code and sample dataset of the proposed CohortGPT model is available at: https://anonymous.4open.science/r/CohortGPT-4872/
Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.
COVID-19 patient triaging with predictive outcome of the patients upon first present to emergency department (ED) is crucial for improving patient prognosis, as well as better hospital resources management and cross-infection control. We trained a deep feature fusion model to predict patient outcomes, where the model inputs were EHR data including demographic information, co-morbidities, vital signs and laboratory measurements, plus patient's CXR images. The model output was patient outcomes defined as the most insensitive oxygen therapy required. For patients without CXR images, we employed Random Forest method for the prediction. Predictive risk scores for COVID-19 severe outcomes ("CO-RISK" score) were derived from model output and evaluated on the testing dataset, as well as compared to human performance. The study's dataset (the "MGB COVID Cohort") was constructed from all patients presenting to the Mass General Brigham (MGB) healthcare system from March 1st to June 1st, 2020. ED visits with incomplete or erroneous data were excluded. Patients with no test order for COVID or confirmed negative test results were excluded. Patients under the age of 15 were also excluded. Finally, electronic health record (EHR) data from a total of 11060 COVID-19 confirmed or suspected patients were used in this study. Chest X-ray (CXR) images were also collected from each patient if available. Results show that CO-RISK score achieved area under the Curve (AUC) of predicting MV/death (i.e. severe outcomes) in 24 hours of 0.95, and 0.92 in 72 hours on the testing dataset. The model shows superior performance to the commonly used risk scores in ED (CURB-65 and MEWS). Comparing with physician's decisions, CO-RISK score has demonstrated superior performance to human in making ICU/floor decisions.
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States. Considering the mixed and unspecific signals in CXR, an image retrieval model of CXR that provides both similar images and associated clinical information can be more clinically meaningful than a direct image diagnostic model. In this work we develop a novel CXR image retrieval model based on deep metric learning. Unlike traditional diagnostic models which aims at learning the direct mapping from images to labels, the proposed model aims at learning the optimized embedding space of images, where images with the same labels and similar contents are pulled together. It utilizes multi-similarity loss with hard-mining sampling strategy and attention mechanism to learn the optimized embedding space, and provides similar images to the query image. The model is trained and validated on an international multi-site COVID-19 dataset collected from 3 different sources. Experimental results of COVID-19 image retrieval and diagnosis tasks show that the proposed model can serve as a robust solution for CXR analysis and patient management for COVID-19. The model is also tested on its transferability on a different clinical decision support task, where the pre-trained model is applied to extract image features from a new dataset without any further training. These results demonstrate our deep metric learning based image retrieval model is highly efficient in the CXR retrieval, diagnosis and prognosis, and thus has great clinical value for the treatment and management of COVID-19 patients.
Purpose. Imaging plays an important role in assessing severity of COVID 19 pneumonia. However, semantic interpretation of chest radiography (CXR) findings does not include quantitative description of radiographic opacities. Most current AI assisted CXR image analysis framework do not quantify for regional variations of disease. To address these, we proposed a four region lung segmentation method to assist accurate quantification of COVID 19 pneumonia. Methods. A segmentation model to separate left and right lung is firstly applied, and then a carina and left hilum detection network is used, which are the clinical landmarks to separate the upper and lower lungs. To improve the segmentation performance of COVID 19 images, ensemble strategy incorporating five models is exploited. Using each region, we evaluated the clinical relevance of the proposed method with the Radiographic Assessment of the Quality of Lung Edema (RALE). Results. The proposed ensemble strategy showed dice score of 0.900, which is significantly higher than conventional methods (0.854 0.889). Mean intensities of segmented four regions indicate positive correlation to the extent and density scores of pulmonary opacities under the RALE framework. Conclusion. A deep learning based model in CXR can accurately segment and quantify regional distribution of pulmonary opacities in patients with COVID 19 pneumonia.
Dynamic computed tomography perfusion (CTP) imaging is a promising approach for acute ischemic stroke diagnosis and evaluation. Hemodynamic parametric maps of cerebral parenchyma are calculated from repeated CT scans of the first pass of iodinated contrast through the brain. It is necessary to reduce the dose of CTP for routine applications due to the high radiation exposure from the repeated scans, where image denoising is necessary to achieve a reliable diagnosis. In this paper, we proposed a self-supervised deep learning method for CTP denoising, which did not require any high-dose reference images for training. The network was trained by mapping each frame of CTP to an estimation from its adjacent frames. Because the noise in the source and target was independent, this approach could effectively remove the noise. Being free from high-dose training images granted the proposed method easier adaptation to different scanning protocols. The method was validated on both simulation and a public real dataset. The proposed method achieved improved image quality compared to conventional denoising methods. On the real data, the proposed method also had improved spatial resolution and contrast-to-noise ratio compared to supervised learning which was trained on the simulation data
The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society. It is urgent to have face forensics techniques to distinguish those tampered images. A large scale dataset "FaceForensics++" has provided enormous training data generated from prominent face manipulation methods to facilitate anti-fake research. However, previous works focus more on casting it as a classification problem by only considering a global prediction. Through investigation to the problem, we find that training a classification network often fails to capture high quality features, which might lead to sub-optimal solutions. In this paper, we zoom in on the problem by conducting a pixel-level analysis, i.e. formulating it as a pixel-level segmentation task. By evaluating multiple architectures on both segmentation and classification tasks, We show the superiority of viewing the problem from a segmentation perspective. Different ablation studies are also performed to investigate what makes an effective and efficient anti-fake model. Strong baselines are also established, which, we hope, could shed some light on the field of face forensics.
We present a retrieval based system for landmark retrieval and recognition challenge.There are five parts in retrieval competition system, including feature extraction and matching to get candidates queue; database augmentation and query extension searching; reranking from recognition results and local feature matching. In recognition challenge including: landmark and non-landmark recognition, multiple recognition results voting and reranking using combination of recognition and retrieval results. All of models trained and predicted by PaddlePaddle framework. Using our method, we achieved 2nd place in the Google Landmark Recognition 2019 and 2nd place in the Google Landmark Retrieval 2019 on kaggle. The source code is available at here.
Cardiovascular disease accounts for 1 in every 4 deaths in United States. Accurate estimation of structural and functional cardiac parameters is crucial for both diagnosis and disease management. In this work, we develop an ensemble learning framework for more accurate and robust left ventricle (LV) quantification. The framework combines two 1st-level modules: direct estimation module and a segmentation module. The direct estimation module utilizes Convolutional Neural Network (CNN) to achieve end-to-end quantification. The CNN is trained by taking 2D cardiac images as input and cardiac parameters as output. The segmentation module utilizes a U-Net architecture for obtaining pixel-wise prediction of the epicardium and endocardium of LV from the background. The binary U-Net output is then analyzed by a separate CNN for estimating the cardiac parameters. We then employ linear regression between the 1st-level predictor and ground truth to learn a 2nd-level predictor that ensembles the results from 1st-level modules for the final estimation. Preliminary results by testing the proposed framework on the LVQuan18 dataset show superior performance of the ensemble learning model over the two base modules.