Alert button
Picture for Timothy Miller

Timothy Miller

Alert button

The impact of using an AI chatbot to respond to patient messages

Oct 26, 2023
Shan Chen, Marco Guevara, Shalini Moningi, Frank Hoebers, Hesham Elhalawani, Benjamin H. Kann, Fallon E. Chipidza, Jonathan Leeman, Hugo J. W. L. Aerts, Timothy Miller, Guergana K. Savova, Raymond H. Mak, Maryam Lustberg, Majid Afshar, Danielle S. Bitterman

Documentation burden is a major contributor to clinician burnout, which is rising nationally and is an urgent threat to our ability to care for patients. Artificial intelligence (AI) chatbots, such as ChatGPT, could reduce clinician burden by assisting with documentation. Although many hospitals are actively integrating such systems into electronic medical record systems, AI chatbots utility and impact on clinical decision-making have not been studied for this intended use. We are the first to examine the utility of large language models in assisting clinicians draft responses to patient questions. In our two-stage cross-sectional study, 6 oncologists responded to 100 realistic synthetic cancer patient scenarios and portal messages developed to reflect common medical situations, first manually, then with AI assistance. We find AI-assisted responses were longer, less readable, but provided acceptable drafts without edits 58% of time. AI assistance improved efficiency 77% of time, with low harm risk (82% safe). However, 7.7% unedited AI responses could severely harm. In 31% cases, physicians thought AI drafts were human-written. AI assistance led to more patient education recommendations, fewer clinical actions than manual responses. Results show promise for AI to improve clinician efficiency and patient care through assisting documentation, if used judiciously. Monitoring model outputs and human-AI interaction remains crucial for safe implementation.

* 4 figures and tables in main, submitted for review 
Viaarxiv icon

Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction

Aug 28, 2023
Yanjun Gao, Ruizhe Li, John Caskey, Dmitriy Dligach, Timothy Miller, Matthew M. Churpek, Majid Afshar

Figure 1 for Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction
Figure 2 for Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction
Figure 3 for Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction
Figure 4 for Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction

Electronic Health Records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives overload healthcare providers, risking diagnostic inaccuracies. While Large Language Models (LLMs) have showcased their potential in diverse language tasks, their application in the healthcare arena needs to ensure the minimization of diagnostic errors and the prevention of patient harm. In this paper, we outline an innovative approach for augmenting the proficiency of LLMs in the realm of automated diagnosis generation, achieved through the incorporation of a medical knowledge graph (KG) and a novel graph model: Dr.Knows, inspired by the clinical diagnostic reasoning process. We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge. Our method negates the need for pre-training and instead leverages the KG as an auxiliary instrument aiding in the interpretation and summarization of complex medical concepts. Using real-world hospital datasets, our experimental results demonstrate that the proposed approach of combining LLMs with KG has the potential to improve the accuracy of automated diagnosis generation. More importantly, our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.

* Under review 
Viaarxiv icon

Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning

Jun 13, 2023
Brihat Sharma, Yanjun Gao, Timothy Miller, Matthew M. Churpek, Majid Afshar, Dmitriy Dligach

Figure 1 for Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning
Figure 2 for Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning
Figure 3 for Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning
Figure 4 for Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning

Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.

* Accepted to the Proceedings of the 5th Clinical NLP Workshop at ACL 
Viaarxiv icon

Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes

Jun 08, 2023
Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M. Churpek, Majid Afshar

Figure 1 for Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes
Figure 2 for Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes
Figure 3 for Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes

The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.

* To appear in the Proceedings of the 5th BioNLP Workshop at ACL 
Viaarxiv icon

Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task

Mar 14, 2023
Yanjun Gao, Dmitriy Dligach, Timothy Miller, Matthew M Churpek, Ozlem Uzuner, Majid Afshar

Figure 1 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Figure 2 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Figure 3 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task
Figure 4 for Progress Note Understanding -- Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 Shared Task

Daily progress notes are common types in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of 2022 n2c2 Track 3 and provide a description of the data, evaluation, participation and system performance.

* To appear in Journal of Biomedical Informatics 
Viaarxiv icon

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing

Sep 29, 2022
Yanjun Gao, Dmitriy Dligach, Timothy Miller, John Caskey, Brihat Sharma, Matthew M Churpek, Majid Afshar

Figure 1 for DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
Figure 2 for DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
Figure 3 for DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing
Figure 4 for DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgment that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce the cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models. Experiments with state-of-the-art pre-trained generative language models using large general domain models and models that were continually trained on a medical corpus demonstrate opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community.

* Under review 
Viaarxiv icon

Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models

Aug 17, 2022
Yanjun Gao, Dmitry Dligach, Timothy Miller, Dongfang Xu, Matthew M. Churpek, Majid Afshar

Figure 1 for Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models
Figure 2 for Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models
Figure 3 for Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models
Figure 4 for Summarizing Patients Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models

Automatically summarizing patients' main problems from daily progress notes using natural language processing methods helps to battle against information and cognitive overload in hospital settings and potentially assists providers with computerized diagnostic decision support. Problem list summarization requires a model to understand, abstract, and generate clinical documentation. In this work, we propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization. We investigate the performance of T5 and BART, two state-of-the-art seq2seq transformer architectures, in solving this problem. We provide a corpus built on top of progress notes from publicly available electronic health record progress notes in the Medical Information Mart for Intensive Care (MIMIC)-III. T5 and BART are trained on general domain text, and we experiment with a data augmentation method and a domain adaptation pre-training method to increase exposure to medical vocabulary and knowledge. Evaluation methods include ROUGE, BERTScore, cosine similarity on sentence embedding, and F-score on medical concepts. Results show that T5 with domain adaptive pre-training achieves significant performance gains compared to a rule-based system and general domain pre-trained language models, indicating a promising direction for tackling the problem summarization task.

* Paper is accepted to COLING 2022 
Viaarxiv icon

Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Apr 06, 2022
Yanjun Gao, Dmitriy Dligach, Timothy Miller, Samuel Tesch, Ryan Laffin, Matthew M. Churpek, Majid Afshar

Figure 1 for Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding
Figure 2 for Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding
Figure 3 for Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding
Figure 4 for Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding

Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.

* To appear in 13th Language Resources and Evaluation Conference (LREC 2022) 
Viaarxiv icon

A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

Dec 07, 2021
Yanjun Gao, Dmitriy Dligach, Leslie Christensen, Samuel Tesch, Ryan Laffin, Dongfang Xu, Timothy Miller, Ozlem Uzuner, Matthew M Churpek, Majid Afshar

Figure 1 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing
Figure 2 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing
Figure 3 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing
Figure 4 for A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

Objective: to provide a scoping review of papers on clinical natural language processing (NLP) tasks that use publicly available electronic health record data from a cohort of patients. Materials and Methods: We searched six databases, including biomedical research and computer science literature database. A round of title/abstract screening and full-text screening were conducted by two reviewers. Our method followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Results: A total of 35 papers with 47 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including name entity recognition, summarization, and other NLP tasks. Some tasks were introduced with a topic of clinical decision support applications, such as substance abuse, phenotyping, cohort selection for clinical trial. We summarized the tasks by publication and dataset information. Discussion: The breadth of clinical NLP tasks keeps growing as the field of NLP evolves with advancements in language systems. However, gaps exist in divergent interests between general domain NLP community and clinical informatics community, and in generalizability of the data sources. We also identified issues in data selection and preparation including the lack of time-sensitive data, and invalidity of problem size and evaluation. Conclusions: The existing clinical NLP tasks cover a wide range of topics and the field will continue to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multi-disciplinary collaboration, reporting transparency, and standardization in data preparation.

* Paper submitted to Journal of American Medical Informatics Association (JAMIA) 
Viaarxiv icon

BCH-NLP at BioCreative VII Track 3: medications detection in tweets using transformer networks and multi-task learning

Nov 26, 2021
Dongfang Xu, Shan Chen, Timothy Miller

Figure 1 for BCH-NLP at BioCreative VII Track 3: medications detection in tweets using transformer networks and multi-task learning
Figure 2 for BCH-NLP at BioCreative VII Track 3: medications detection in tweets using transformer networks and multi-task learning
Figure 3 for BCH-NLP at BioCreative VII Track 3: medications detection in tweets using transformer networks and multi-task learning
Figure 4 for BCH-NLP at BioCreative VII Track 3: medications detection in tweets using transformer networks and multi-task learning

In this paper, we present our work participating in the BioCreative VII Track 3 - automatic extraction of medication names in tweets, where we implemented a multi-task learning model that is jointly trained on text classification and sequence labelling. Our best system run achieved a strict F1 of 80.4, ranking first and more than 10 points higher than the average score of all participants. Our analyses show that the ensemble technique, multi-task learning, and data augmentation are all beneficial for medication detection in tweets.

* Proceedings of the seventh BioCreative challenge evaluation workshop (https://biocreative.bioinformatics.udel.edu/resources/publications/bc-vii-workshop-proceedings/) 
Viaarxiv icon