Alert button
Picture for Xinlu Zhang

Xinlu Zhang

Alert button

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

May 22, 2023
Xinlu Zhang, Shiyang Li, Xianjun Yang, Chenxin Tian, Yao Qin, Linda Ruth Petzold

Figure 1 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting
Figure 2 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting
Figure 3 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting
Figure 4 for Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability.

Viaarxiv icon

Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Feb 16, 2023
Xianjun Yang, Yan Li, Xinlu Zhang, Haifeng Chen, Wei Cheng

Figure 1 for Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization
Figure 2 for Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization
Figure 3 for Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization

Text summarization has been a crucial problem in natural language processing (NLP) for several decades. It aims to condense lengthy documents into shorter versions while retaining the most critical information. Various methods have been proposed for text summarization, including extractive and abstractive summarization. The emergence of large language models (LLMs) like GPT3 and ChatGPT has recently created significant interest in using these models for text summarization tasks. Recent studies \cite{goyal2022news, zhang2023benchmarking} have shown that LLMs-generated news summaries are already on par with humans. However, the performance of LLMs for more practical applications like aspect or query-based summaries is underexplored. To fill this gap, we conducted an evaluation of ChatGPT's performance on four widely used benchmark datasets, encompassing diverse summaries from Reddit posts, news articles, dialogue meetings, and stories. Our experiments reveal that ChatGPT's performance is comparable to traditional fine-tuning methods in terms of Rouge scores. Moreover, we highlight some unique differences between ChatGPT-generated summaries and human references, providing valuable insights into the superpower of ChatGPT for diverse text summarization tasks. Our findings call for new directions in this area, and we plan to conduct further research to systematically examine the characteristics of ChatGPT-generated summaries through extensive human evaluation.

* Work in progress 
Viaarxiv icon

Explanations from Large Language Models Make Small Reasoners Better

Oct 13, 2022
Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan

Figure 1 for Explanations from Large Language Models Make Small Reasoners Better
Figure 2 for Explanations from Large Language Models Make Small Reasoners Better
Figure 3 for Explanations from Large Language Models Make Small Reasoners Better
Figure 4 for Explanations from Large Language Models Make Small Reasoners Better

Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the explanations generated by LLM to improve the training of small reasoners, which are more favorable in real-production deployment due to their low cost. We systematically explore three explanation generation approaches from LLM and utilize a multi-task learning framework to facilitate small models to acquire strong reasoning power together with explanation generation capabilities. Experiments on multiple reasoning tasks show that our method can consistently and significantly outperform finetuning baselines across different settings, and even perform better than finetuning/prompting a 60x larger GPT-3 (175B) model by up to 9.5% in accuracy. As a side benefit, human evaluation further shows that our method can generate high-quality explanations to justify its predictions, moving towards the goal of explainable AI.

Viaarxiv icon

Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks

Jun 22, 2021
Xinlu Zhang, Yun Zhao, Rachael Callcut, Linda Petzold

Figure 1 for Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks
Figure 2 for Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks
Figure 3 for Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks
Figure 4 for Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks

Multiple organ failure (MOF) is a severe syndrome with a high mortality rate among Intensive Care Unit (ICU) patients. Early and precise detection is critical for clinicians to make timely decisions. An essential challenge in applying machine learning models to electronic health records (EHRs) is the pervasiveness of missing values. Most existing imputation methods are involved in the data preprocessing phase, failing to capture the relationship between data and outcome for downstream predictions. In this paper, we propose classifier-guided generative adversarial imputation networks Classifier-GAIN) for MOF prediction to bridge this gap, by incorporating both observed data and label information. Specifically, the classifier takes imputed values from the generator(imputer) to predict task outcomes and provides additional supervision signals to the generator by joint training. The classifier-guide generator imputes missing values with label-awareness during training, improving the classifier's performance during inference. We conduct extensive experiments showing that our approach consistently outperforms classical and state-of-art neural baselines across a range of missing data scenarios and evaluation metrics.

* BioKDD 
Viaarxiv icon

BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients

Mar 19, 2021
Yun Zhao, Qinghang Hong, Xinlu Zhang, Yu Deng, Yuqing Wang, Linda Petzold

Figure 1 for BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients
Figure 2 for BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients
Figure 3 for BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients
Figure 4 for BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients

Survival analysis is a technique to predict the times of specific outcomes, and is widely used in predicting the outcomes for intensive care unit (ICU) trauma patients. Recently, deep learning models have drawn increasing attention in healthcare. However, there is a lack of deep learning methods that can model the relationship between measurements, clinical notes and mortality outcomes. In this paper we introduce BERTSurv, a deep learning survival framework which applies Bidirectional Encoder Representations from Transformers (BERT) as a language representation model on unstructured clinical notes, for mortality prediction and survival analysis. We also incorporate clinical measurements in BERTSurv. With binary cross-entropy (BCE) loss, BERTSurv can predict mortality as a binary outcome (mortality prediction). With partial log-likelihood (PLL) loss, BERTSurv predicts the probability of mortality as a time-to-event outcome (survival analysis). We apply BERTSurv on Medical Information Mart for Intensive Care III (MIMIC III) trauma patient data. For mortality prediction, BERTSurv obtained an area under the curve of receiver operating characteristic curve (AUC-ROC) of 0.86, which is an improvement of 3.6% over baseline of multilayer perceptron (MLP) without notes. For survival analysis, BERTSurv achieved a concordance index (C-index) of 0.7. In addition, visualizations of BERT's attention heads help to extract patterns in clinical notes and improve model interpretability by showing how the model assigns weights to different inputs.

* ICDM 2021 
Viaarxiv icon