Alert button
Picture for Jie Zhao

Jie Zhao

Alert button

Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review

Nov 03, 2023
Mingze Yuan, Peng Bao, Jiajia Yuan, Yunhao Shen, Zifan Chen, Yi Xie, Jie Zhao, Yang Chen, Li Zhang, Lin Shen, Bin Dong

With the rapid development of artificial intelligence, large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning. This has sparked significant interest in applying LLMs to enhance various aspects of healthcare, ranging from medical education to clinical decision support. However, medicine involves multifaceted data modalities and nuanced reasoning skills, presenting challenges for integrating LLMs. This paper provides a comprehensive review on the applications and implications of LLMs in medicine. It begins by examining the fundamental applications of general-purpose and specialized LLMs, demonstrating their utilities in knowledge retrieval, research support, clinical workflow automation, and diagnostic assistance. Recognizing the inherent multimodality of medicine, the review then focuses on multimodal LLMs, investigating their ability to process diverse data types like medical imaging and EHRs to augment diagnostic accuracy. To address LLMs' limitations regarding personalization and complex clinical reasoning, the paper explores the emerging development of LLM-powered autonomous agents for healthcare. Furthermore, it summarizes the evaluation methodologies for assessing LLMs' reliability and safety in medical contexts. Overall, this review offers an extensive analysis on the transformative potential of LLMs in modern medicine. It also highlights the pivotal need for continuous optimizations and ethical oversight before these models can be effectively integrated into clinical practice. Visit https://github.com/mingze-yuan/Awesome-LLM-Healthcare for an accompanying GitHub repository containing latest papers.

* 24 pages, 1 figure, 3 tables 
Viaarxiv icon

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Oct 07, 2023
Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao

Figure 1 for Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Figure 2 for Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Figure 3 for Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Figure 4 for Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.

Viaarxiv icon

Few-Shot Domain Adaptation for Charge Prediction on Unprofessional Descriptions

Sep 29, 2023
Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang, Xiaofei He

Recent works considering professional legal-linguistic style (PLLS) texts have shown promising results on the charge prediction task. However, unprofessional users also show an increasing demand on such a prediction service. There is a clear domain discrepancy between PLLS texts and non-PLLS texts expressed by those laypersons, which degrades the current SOTA models' performance on non-PLLS texts. A key challenge is the scarcity of non-PLLS data for most charge classes. This paper proposes a novel few-shot domain adaptation (FSDA) method named Disentangled Legal Content for Charge Prediction (DLCCP). Compared with existing FSDA works, which solely perform instance-level alignment without considering the negative impact of text style information existing in latent features, DLCCP (1) disentangles the content and style representations for better domain-invariant legal content learning with carefully designed optimization goals for content and style spaces and, (2) employs the constitutive elements knowledge of charges to extract and align element-level and instance-level content representations simultaneously. We contribute the first publicly available non-PLLS dataset named NCCP for developing layperson-friendly charge prediction models. Experiments on NCCP show the superiority of our methods over competitive baselines.

Viaarxiv icon

Leveraging the Power of Data Augmentation for Transformer-based Tracking

Sep 15, 2023
Jie Zhao, Johan Edstedt, Michael Felsberg, Dong Wang, Huchuan Lu

Figure 1 for Leveraging the Power of Data Augmentation for Transformer-based Tracking
Figure 2 for Leveraging the Power of Data Augmentation for Transformer-based Tracking
Figure 3 for Leveraging the Power of Data Augmentation for Transformer-based Tracking
Figure 4 for Leveraging the Power of Data Augmentation for Transformer-based Tracking

Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance. Previous works focus on designing effective architectures suited for tracking, but ignore that data augmentation is equally crucial for training a well-performing model. In this paper, we first explore the impact of general data augmentations on transformer-based trackers via systematic experiments, and reveal the limited effectiveness of these common strategies. Motivated by experimental observations, we then propose two data augmentation methods customized for tracking. First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples. Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference. Extensive experiments on two transformer-based trackers and six benchmarks demonstrate the effectiveness and data efficiency of our methods, especially under challenging settings, like one-shot tracking and small image resolutions.

* 10 pages, 5 figures, 7 tables 
Viaarxiv icon

Utilizing Large Language Models for Natural Interface to Pharmacology Databases

Jul 26, 2023
Hong Lu, Chuan Li, Yinheng Li, Jie Zhao

Figure 1 for Utilizing Large Language Models for Natural Interface to Pharmacology Databases
Figure 2 for Utilizing Large Language Models for Natural Interface to Pharmacology Databases

The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results. Each stage requires accessing and querying vast amounts of information. In this abstract, we introduce a Large Language Model (LLM)-based Natural Language Interface designed to interact with structured information stored in databases. Our experiments demonstrate the feasibility and effectiveness of the proposed framework. This framework can generalize to query a wide range of pharmaceutical data and knowledge bases.

* BIOKDD 2023 abstract track 
Viaarxiv icon

End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments

Jun 29, 2023
Arvi Jonnarth, Jie Zhao, Michael Felsberg

Figure 1 for End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments
Figure 2 for End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments
Figure 3 for End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments
Figure 4 for End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments

Coverage path planning is the problem of finding the shortest path that covers the entire free space of a given confined area, with applications ranging from robotic lawn mowing and vacuum cleaning, to demining and search-and-rescue tasks. While offline methods can find provably complete, and in some cases optimal, paths for known environments, their value is limited in online scenarios where the environment is not known beforehand, especially in the presence of non-static obstacles. We propose an end-to-end reinforcement learning-based approach in continuous state and action space, for the online coverage path planning problem that can handle unknown environments. We construct the observation space from both global maps and local sensory inputs, allowing the agent to plan a long-term path, and simultaneously act on short-term obstacle detections. To account for large-scale environments, we propose to use a multi-scale map input representation. Furthermore, we propose a novel total variation reward term for eliminating thin strips of uncovered space in the learned path. To validate the effectiveness of our approach, we perform extensive experiments in simulation with a distance sensor, surpassing the performance of a recent reinforcement learning-based approach.

Viaarxiv icon

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Jun 16, 2023
Dequan Wang, Xiaosong Wang, Lilong Wang, Mengzhang Li, Qian Da, Xiaoqiang Liu, Xiangyu Gao, Jun Shen, Junjun He, Tian Shen, Qi Duan, Jie Zhao, Kang Li, Yu Qiao, Shaoting Zhang

Figure 1 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Figure 2 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Figure 3 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification
Figure 4 for MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications. Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples, e.g., in-context learning. Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks. In this paper, we aim at approaches adapting the foundation models for medical image classification and present a novel dataset and benchmark for the evaluation, i.e., examining the overall performance of accommodating the large-scale foundation models downstream on a set of diverse real-world clinical tasks. We collect five sets of medical imaging data from multiple institutes targeting a variety of real-world clinical tasks (22,349 images in total), i.e., thoracic diseases screening in X-rays, pathological lesion tissue screening, lesion detection in endoscopy images, neonatal jaundice evaluation, and diabetic retinopathy grading. Results of multiple baseline methods are demonstrated using the proposed dataset from both accuracy and cost-effective perspectives.

* Preprint. Under review 
Viaarxiv icon

propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans

May 29, 2023
Zifan Chen, Jiazheng Li, Jie Zhao, Yiting Liu, Hongfeng Li, Bin Dong, Lei Tang, Li Zhang

Figure 1 for propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans
Figure 2 for propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans
Figure 3 for propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans
Figure 4 for propnet: Propagating 2D Annotation to 3D Segmentation for Gastric Tumors on CT Scans

**Background:** Accurate 3D CT scan segmentation of gastric tumors is pivotal for diagnosis and treatment. The challenges lie in the irregular shapes, blurred boundaries of tumors, and the inefficiency of existing methods. **Purpose:** We conducted a study to introduce a model, utilizing human-guided knowledge and unique modules, to address the challenges of 3D tumor segmentation. **Methods:** We developed the PropNet framework, propagating radiologists' knowledge from 2D annotations to the entire 3D space. This model consists of a proposing stage for coarse segmentation and a refining stage for improved segmentation, using two-way branches for enhanced performance and an up-down strategy for efficiency. **Results:** With 98 patient scans for training and 30 for validation, our method achieves a significant agreement with manual annotation (Dice of 0.803) and improves efficiency. The performance is comparable in different scenarios and with various radiologists' annotations (Dice between 0.785 and 0.803). Moreover, the model shows improved prognostic prediction performance (C-index of 0.620 vs. 0.576) on an independent validation set of 42 patients with advanced gastric cancer. **Conclusions:** Our model generates accurate tumor segmentation efficiently and stably, improving prognostic performance and reducing high-throughput image reading workload. This model can accelerate the quantitative analysis of gastric tumors and enhance downstream task performance.

Viaarxiv icon

NeuralMatrix: Moving Entire Neural Networks to General Matrix Multiplication for Efficient Inference

May 23, 2023
Ruiqi Sun, Jie Zhao, Xin He, Yiran Li, An Zou

Figure 1 for NeuralMatrix: Moving Entire Neural Networks to General Matrix Multiplication for Efficient Inference
Figure 2 for NeuralMatrix: Moving Entire Neural Networks to General Matrix Multiplication for Efficient Inference
Figure 3 for NeuralMatrix: Moving Entire Neural Networks to General Matrix Multiplication for Efficient Inference
Figure 4 for NeuralMatrix: Moving Entire Neural Networks to General Matrix Multiplication for Efficient Inference

In this study, we introduce NeuralMatrix, a novel framework that enables the computation of versatile deep neural networks (DNNs) on a single general matrix multiplication (GEMM) accelerator. The proposed approach overcomes the specificity limitations of ASIC-based accelerators while achieving application-specific acceleration levels compared to general-purpose processors such as CPUs and GPUs. We address the challenges of mapping both linear and nonlinear operations in DNN computation to general matrix multiplications and the impact of using a GEMM accelerator on DNN inference accuracy. Extensive experiments are conducted on various DNN models from three popular categories (i.e., CNN, Transformers, and GNN) as illustrative backbone models. Our results demonstrate that DNNs suffer only up to a 2.02% accuracy loss after being converted to general matrix multiplication, while achieving 113x to 19.44x improvements in throughput per power compared to CPUs and GPUs.

* 12 pages, 4 figures, Submitted to 37th Conference on Neural Information Processing Systems (NeurIPS 2023) 
Viaarxiv icon