Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pranav Rajpurkar

Harvard University

The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports

Dec 16, 2024

Julián N. Acosta, Siddhant Dogra, Subathra Adithan, Kay Wu, Michael Moritz, Stephen Kwak, Pranav Rajpurkar

Figure 1 for The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports

Figure 2 for The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports

Figure 3 for The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports

Figure 4 for The Impact of AI Assistance on Radiology Reporting: A Pilot Study Using Simulated AI Draft Reports

Abstract:Radiologists face increasing workload pressures amid growing imaging volumes, creating risks of burnout and delayed reporting times. While artificial intelligence (AI) based automated radiology report generation shows promise for reporting workflow optimization, evidence of its real-world impact on clinical accuracy and efficiency remains limited. This study evaluated the effect of draft reports on radiology reporting workflows by conducting a three reader multi-case study comparing standard versus AI-assisted reporting workflows. In both workflows, radiologists reviewed the cases and modified either a standard template (standard workflow) or an AI-generated draft report (AI-assisted workflow) to create the final report. For controlled evaluation, we used GPT-4 to generate simulated AI drafts and deliberately introduced 1-3 errors in half the cases to mimic real AI system performance. The AI-assisted workflow significantly reduced average reporting time from 573 to 435 seconds (p=0.003), without a statistically significant difference in clinically significant errors between workflows. These findings suggest that AI-generated drafts can meaningfully accelerate radiology reporting while maintaining diagnostic accuracy, offering a practical solution to address mounting workload challenges in clinical practice.

Via

Access Paper or Ask Questions

MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting

Dec 04, 2024

Arnold Caleb Asiimwe, Dídac Surís, Pranav Rajpurkar, Carl Vondrick

Figure 1 for MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting

Figure 2 for MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting

Figure 3 for MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting

Figure 4 for MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting

Abstract:In medical reporting, the accuracy of radiological reports, whether generated by humans or machine learning algorithms, is critical. We tackle a new task in this paper: image-conditioned autocorrection of inaccuracies within these reports. Using the MIMIC-CXR dataset, we first intentionally introduce a diverse range of errors into reports. Subsequently, we propose a two-stage framework capable of pinpointing these errors and then making corrections, simulating an \textit{autocorrection} process. This method aims to address the shortcomings of existing automated medical reporting systems, like factual errors and incorrect conclusions, enhancing report reliability in vital healthcare applications. Importantly, our approach could serve as a guardrail, ensuring the accuracy and trustworthiness of automated report generation. Experiments on established datasets and state of the art report generation models validate this method's potential in correcting medical reporting errors.

Via

Access Paper or Ask Questions

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Nov 27, 2024

Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, Pranav Rajpurkar

Figure 1 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Figure 2 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Figure 3 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Figure 4 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Abstract:Medical vision-language model models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce FactCheXcker, a modular framework that de-hallucinates radiology report measurements by leveraging an improved query-code-update paradigm. Specifically, FactCheXcker employs specialized modules and the code generation capabilities of large language models to solve measurement queries generated based on the original report. After extracting measurable findings, the results are incorporated into an updated report. We evaluate FactCheXcker on endotracheal tube placement, which accounts for an average of 78% of report measurements, using the MIMIC-CXR dataset and 11 medical report-generation models. Our results show that FactCheXcker significantly reduces hallucinations, improves measurement precision, and maintains the quality of the original reports. Specifically, FactCheXcker improves the performance of all 11 models and achieves an average improvement of 94.0% in reducing measurement hallucinations measured by mean absolute error.

Via

Access Paper or Ask Questions

ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Nov 22, 2024

Xiaoman Zhang, Hong-Yu Zhou, Xiaoli Yang, Oishi Banerjee, Julián N. Acosta, Josh Miller, Ouwen Huang, Pranav Rajpurkar

Figure 1 for ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Figure 2 for ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Figure 3 for ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Figure 4 for ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation

Abstract:AI-driven models have demonstrated significant potential in automating radiology report generation for chest X-rays. However, there is no standardized benchmark for objectively evaluating their performance. To address this, we present ReXrank, https://rexrank.ai, a public leaderboard and challenge for assessing AI-powered radiology report generation. Our framework incorporates ReXGradient, the largest test dataset consisting of 10,000 studies, and three public datasets (MIMIC-CXR, IU-Xray, CheXpert Plus) for report generation assessment. ReXrank employs 8 evaluation metrics and separately assesses models capable of generating only findings sections and those providing both findings and impressions sections. By providing this standardized evaluation framework, ReXrank enables meaningful comparisons of model performance and offers crucial insights into their robustness across diverse clinical settings. Beyond its current focus on chest X-rays, ReXrank's framework sets the stage for comprehensive evaluation of automated reporting across the full spectrum of medical imaging.

Via

Access Paper or Ask Questions

RadFlag: A Black-Box Hallucination Detection Method for Medical Vision Language Models

Nov 01, 2024

Sraavya Sambara, Serena Zhang, Oishi Banerjee, Julian Acosta, John Fahrner, Pranav Rajpurkar

Abstract:Generating accurate radiology reports from medical images is a clinically important but challenging task. While current Vision Language Models (VLMs) show promise, they are prone to generating hallucinations, potentially compromising patient care. We introduce RadFlag, a black-box method to enhance the accuracy of radiology report generation. Our method uses a sampling-based flagging technique to find hallucinatory generations that should be removed. We first sample multiple reports at varying temperatures and then use a Large Language Model (LLM) to identify claims that are not consistently supported across samples, indicating that the model has low confidence in those claims. Using a calibrated threshold, we flag a fraction of these claims as likely hallucinations, which should undergo extra review or be automatically rejected. Our method achieves high precision when identifying both individual hallucinatory sentences and reports that contain hallucinations. As an easy-to-use, black-box system that only requires access to a model's temperature parameter, RadFlag is compatible with a wide range of radiology report generation models and has the potential to broadly improve the quality of automated radiology reporting.

* 15 pages, 8 figures

Via

Access Paper or Ask Questions

A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Oct 28, 2024

Zifeng Wang, Hanyin Wang, Benjamin Danek, Ying Li, Christina Mack, Hoifung Poon, Yajun Wang, Pranav Rajpurkar, Jimeng Sun

Figure 1 for A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Figure 2 for A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Figure 3 for A Perspective for Adapting Generalist AI to Specialized Medical AI Applications and Their Challenges

Abstract:The integration of Large Language Models (LLMs) into medical applications has sparked widespread interest across the healthcare industry, from drug discovery and development to clinical decision support, assisting telemedicine, medical devices, and healthcare insurance applications. This perspective paper aims to discuss the inner workings of building LLM-powered medical AI applications and introduces a comprehensive framework for their development. We review existing literature and outline the unique challenges of applying LLMs in specialized medical contexts. Additionally, we introduce a three-step framework to organize medical LLM research activities: 1) Modeling: breaking down complex medical workflows into manageable steps for developing medical-specific models; 2) Optimization: optimizing the model performance with crafted prompts and integrating external knowledge and tools, and 3) System engineering: decomposing complex tasks into subtasks and leveraging human expertise for building medical AI applications. Furthermore, we offer a detailed use case playbook that describes various LLM-powered medical AI applications, such as optimizing clinical trial design, enhancing clinical decision support, and advancing medical imaging analysis. Finally, we discuss various challenges and considerations for building medical AI applications with LLMs, such as handling hallucination issues, data ownership and compliance, privacy, intellectual property considerations, compute cost, sustainability issues, and responsible AI requirements.

Via

Access Paper or Ask Questions

ReXplain: Translating Radiology into Patient-Friendly Video Reports

Oct 01, 2024

Luyang Luo, Jenanan Vairavamurthy, Xiaoman Zhang, Abhinav Kumar, Ramon R. Ter-Oganesyan, Stuart T. Schroff, Dan Shilo, Rydhwana Hossain, Mike Moritz, Pranav Rajpurkar

Figure 1 for ReXplain: Translating Radiology into Patient-Friendly Video Reports

Figure 2 for ReXplain: Translating Radiology into Patient-Friendly Video Reports

Figure 3 for ReXplain: Translating Radiology into Patient-Friendly Video Reports

Figure 4 for ReXplain: Translating Radiology into Patient-Friendly Video Reports

Abstract:Radiology reports often remain incomprehensible to patients, undermining patient-centered care. We present ReXplain (Radiology eXplanation), an innovative AI-driven system that generates patient-friendly video reports for radiology findings. ReXplain uniquely integrates a large language model for text simplification, an image segmentation model for anatomical region identification, and an avatar generation tool, producing comprehensive explanations with plain language, highlighted imagery, and 3D organ renderings. Our proof-of-concept study with five board-certified radiologists indicates that ReXplain could accurately deliver radiological information and effectively simulate one-on-one consultations. This work demonstrates a new paradigm in AI-assisted medical communication, potentially improving patient engagement and satisfaction in radiology care, and opens new avenues for research in multimodal medical communication.

* 13 pages

Via

Access Paper or Ask Questions

ReXErr: Synthesizing Clinically Meaningful Errors in Diagnostic Radiology Reports

Sep 17, 2024

Vishwanatha M. Rao, Serena Zhang, Julian N. Acosta, Subathra Adithan, Pranav Rajpurkar

Abstract:Accurately interpreting medical images and writing radiology reports is a critical but challenging task in healthcare. Both human-written and AI-generated reports can contain errors, ranging from clinical inaccuracies to linguistic mistakes. To address this, we introduce ReXErr, a methodology that leverages Large Language Models to generate representative errors within chest X-ray reports. Working with board-certified radiologists, we developed error categories that capture common mistakes in both human and AI-generated reports. Our approach uses a novel sampling scheme to inject diverse errors while maintaining clinical plausibility. ReXErr demonstrates consistency across error categories and produces errors that closely mimic those found in real-world scenarios. This method has the potential to aid in the development and evaluation of report correction algorithms, potentially enhancing the quality and reliability of radiology reporting.

Via

Access Paper or Ask Questions

ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics

Aug 29, 2024

Oishi Banerjee, Agustina Saenz, Kay Wu, Warren Clements, Adil Zia, Dominic Buensalido, Helen Kavnoudias, Alain S. Abi-Ghanem, Nour El Ghawi, Cibele Luna(+10 more)

Abstract:Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First, our method tests whether a metric is undesirably sensitive to reporting style, providing different scores depending on whether AI-generated reports are stylistically similar to ground-truth reports or not. Second, our method measures whether a metric reliably agrees with experts, or whether metric and expert scores of AI-generated report quality diverge for some sites. Using 240 reports from 6 hospitals around the world, we apply ReXamine-Global to 7 established report evaluation metrics and uncover serious gaps in their generalizability. Developers can apply ReXamine-Global when designing new report evaluation metrics, ensuring their robustness across sites. Additionally, our analysis of existing metrics can guide users of those metrics towards evaluation procedures that work reliably at their sites of interest.

Via

Access Paper or Ask Questions

Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Aug 26, 2024

Xiaoman Zhang, Julián N. Acosta, Hong-Yu Zhou, Pranav Rajpurkar

Figure 1 for Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Figure 2 for Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Figure 3 for Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Figure 4 for Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

Abstract:Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from processed reports to construct a comprehensive radiology knowledge graph. We then propose three metrics to evaluate the similarity of nodes (ReXKG-NSC), distribution of edges (ReXKG-AMS), and coverage of subgraphs (ReXKG-SCS) across various knowledge graphs. We conduct an in-depth comparative analysis of AI-generated and human-written radiology reports, assessing the performance of both specialist and generalist models. Our study provides a deeper understanding of the capabilities and limitations of current AI models in radiology report generation, offering valuable insights for improving model performance and clinical applicability.

* Code is available at: https://github.com/rajpurkarlab/ReXKG

Via

Access Paper or Ask Questions