Alert button
Picture for Yulan He

Yulan He

Alert button

Can Prompt Learning Benefit Radiology Report Generation?

Aug 30, 2023
Jun Wang, Lixing Zhu, Abhir Bhalerao, Yulan He

Radiology report generation aims to automatically provide clinically meaningful descriptions of radiology images such as MRI and X-ray. Although great success has been achieved in natural scene image captioning tasks, radiology report generation remains challenging and requires prior medical knowledge. In this paper, we propose PromptRRG, a method that utilizes prompt learning to activate a pretrained model and incorporate prior knowledge. Since prompt learning for radiology report generation has not been explored before, we begin with investigating prompt designs and categorise them based on varying levels of knowledge: common, domain-specific and disease-enriched prompts. Additionally, we propose an automatic prompt learning mechanism to alleviate the burden of manual prompt engineering. This is the first work to systematically examine the effectiveness of prompt learning for radiology report generation. Experimental results on the largest radiology report generation benchmark, MIMIC-CXR, demonstrate that our proposed method achieves state-of-the-art performance. Code will be available upon the acceptance.

* 8 pages with 6 pages supplementary file 
Viaarxiv icon

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

Aug 23, 2023
Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, Yunsheng Wu

Figure 1 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Figure 2 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Figure 3 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Figure 4 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations. We demonstrate a long-range open-domain conversation through iterative "memorization-retrieval-response" cycles. This requires us to carefully design tailored tuning instructions for each distinct stage. The instructions are reconstructed from a collection of public datasets to teach the LLMs to memorize and retrieve past dialogues with structured memos, leading to enhanced consistency when participating in future conversations. We invite experts to manually annotate a test set designed to evaluate the consistency of long-range conversations questions. Experiments on three testing scenarios involving both open-source and API-accessible chatbots at scale verify the efficacy of MemoChat, which outperforms strong baselines. Our codes, data and models are available here: https://github.com/LuJunru/MemoChat.

Viaarxiv icon

Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources

Jul 22, 2023
Jiasheng Si, Yingjie Zhu, Xingyu Shi, Deyu Zhou, Yulan He

Figure 1 for Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources
Figure 2 for Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources
Figure 3 for Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources
Figure 4 for Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources

Given a controversial target such as ``nuclear energy'', argument mining aims to identify the argumentative text from heterogeneous sources. Current approaches focus on exploring better ways of integrating the target-associated semantic information with the argumentative text. Despite their empirical successes, two issues remain unsolved: (i) a target is represented by a word or a phrase, which is insufficient to cover a diverse set of target-related subtopics; (ii) the sentence-level topic information within an argument, which we believe is crucial for argument mining, is ignored. To tackle the above issues, we propose a novel explainable topic-enhanced argument mining approach. Specifically, with the use of the neural topic model and the language model, the target information is augmented by explainable topic representations. Moreover, the sentence-level topic information within the argument is captured by minimizing the distance between its latent topic distribution and its semantic representation through mutual learning. Experiments have been conducted on the benchmark dataset in both the in-target setting and the cross-target setting. Results demonstrate the superiority of the proposed model against the state-of-the-art baselines.

* 10 pages, 3 figures 
Viaarxiv icon

CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models

Jun 06, 2023
Jiazheng Li, Zhaoyue Sun, Bin Liang, Lin Gui, Yulan He

Figure 1 for CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models
Figure 2 for CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models
Figure 3 for CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models
Figure 4 for CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models

Text classifiers built on Pre-trained Language Models (PLMs) have achieved remarkable progress in various tasks including sentiment analysis, natural language inference, and question-answering. However, the occurrence of uncertain predictions by these classifiers poses a challenge to their reliability when deployed in practical applications. Much effort has been devoted to designing various probes in order to understand what PLMs capture. But few studies have delved into factors influencing PLM-based classifiers' predictive uncertainty. In this paper, we propose a novel framework, called CUE, which aims to interpret uncertainties inherent in the predictions of PLM-based models. In particular, we first map PLM-encoded representations to a latent space via a variational auto-encoder. We then generate text representations by perturbing the latent space which causes fluctuation in predictive uncertainty. By comparing the difference in predictive uncertainty between the perturbed and the original text representations, we are able to identify the latent dimensions responsible for uncertainty and subsequently trace back to the input features that contribute to such uncertainty. Our extensive experiments on four benchmark datasets encompassing linguistic acceptability classification, emotion classification, and natural language inference show the feasibility of our proposed framework. Our source code is available at: https://github.com/lijiazheng99/CUE.

* Accepted to UAI 2023 
Viaarxiv icon

Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization

May 30, 2023
Xinyu Wang, Lin Gui, Yulan He

Figure 1 for Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization
Figure 2 for Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization
Figure 3 for Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization
Figure 4 for Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization

Document-level multi-event extraction aims to extract the structural information from a given document automatically. Most recent approaches usually involve two steps: (1) modeling entity interactions; (2) decoding entity interactions into events. However, such approaches ignore a global view of inter-dependency of multiple events. Moreover, an event is decoded by iteratively merging its related entities as arguments, which might suffer from error propagation and is computationally inefficient. In this paper, we propose an alternative approach for document-level multi-event extraction with event proxy nodes and Hausdorff distance minimization. The event proxy nodes, representing pseudo-events, are able to build connections with other event proxy nodes, essentially capturing global information. The Hausdorff distance makes it possible to compare the similarity between the set of predicted events and the set of ground-truth events. By directly minimizing Hausdorff distance, the model is trained towards the global optimum directly, which improves performance and reduces training time. Experimental results show that our model outperforms previous state-of-the-art method in F1-score on two datasets with only a fraction of training time.

Viaarxiv icon

OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach

May 24, 2023
Jiazheng Li, Runcong Zhao, Yulan He, Lin Gui

Figure 1 for OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach
Figure 2 for OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach
Figure 3 for OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach
Figure 4 for OverPrompt: Enhancing ChatGPT Capabilities through an Efficient In-Context Learning Approach

The exceptional performance of pre-trained large language models has revolutionised various applications, but their adoption in production environments is hindered by prohibitive costs and inefficiencies, particularly when utilising long prompts. This paper proposes OverPrompt, an in-context learning method aimed at improving LLM efficiency and performance by processing multiple inputs in parallel. Evaluated across diverse datasets, OverPrompt enhances task efficiency and integrates a diverse range of examples for improved performance. Particularly, it amplifies fact-checking and sentiment analysis tasks when supplemented with contextual information. Synthetic data grouping further enhances performance, suggesting a viable approach for data augmentation.

Viaarxiv icon

Distilling ChatGPT for Explainable Automated Student Answer Assessment

May 22, 2023
Jiazheng Li, Lin Gui, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He

Figure 1 for Distilling ChatGPT for Explainable Automated Student Answer Assessment
Figure 2 for Distilling ChatGPT for Explainable Automated Student Answer Assessment
Figure 3 for Distilling ChatGPT for Explainable Automated Student Answer Assessment
Figure 4 for Distilling ChatGPT for Explainable Automated Student Answer Assessment

Assessing student answers and providing valuable feedback is crucial for effective learning, but it can be a time-consuming task. Traditional methods of automating student answer assessment through text classification often suffer from issues such as lack of trustworthiness, transparency, and the ability to provide a rationale for the automated assessment process. These limitations hinder their usefulness in practice. In this paper, we explore using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation under both the zero-shot and few-shot settings. We introduce a critic module which automatically filters incorrect outputs from ChatGPT and utilizes the remaining ChtaGPT outputs as noisy labelled data to fine-tune a smaller language model, enabling it to perform student answer scoring and rationale generation. Moreover, by drawing multiple samples from ChatGPT outputs, we are able to compute predictive confidence scores, which in turn can be used to identify corrupted data and human label errors in the training set. Our experimental results demonstrate that despite being a few orders of magnitude smaller than ChatGPT, the fine-tuned language model achieves better performance in student answer scoring. Furthermore, it generates more detailed and comprehensible assessments than traditional text classification methods. Our approach provides a viable solution to achieve explainable automated assessment in education.

Viaarxiv icon

BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling

May 17, 2023
Zheng Fang, Yulan He, Rob Procter

Figure 1 for BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling
Figure 2 for BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling
Figure 3 for BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling
Figure 4 for BERTTM: Leveraging Contextualized Word Embeddings from Pre-trained Language Models for Neural Topic Modeling

With the development of neural topic models in recent years, topic modelling is playing an increasingly important role in natural language understanding. However, most existing topic models still rely on bag-of-words (BoW) information, either as training input or training target. This limits their ability to capture word order information in documents and causes them to suffer from the out-of-vocabulary (OOV) issue, i.e. they cannot handle unobserved words in new documents. Contextualized word embeddings from pre-trained language models show superiority in the ability of word sense disambiguation and prove to be effective in dealing with OOV words. In this work, we developed a novel neural topic model combining contextualized word embeddings from the pre-trained language model BERT. The model can infer the topic distribution of a document without using any BoW information. In addition, the model can infer the topic distribution of each word in a document directly from the contextualized word embeddings. Experiments on several datasets show that our model outperforms existing topic models in terms of both document classification and topic coherence metrics and can accommodate unseen words from newly arrived documents. Experiments on the NER dataset also show that our model can produce high-quality word topic representations.

* The paper requires major revision. Reviewers from a journal queried about some fundamental assumptions of the proposed approach 
Viaarxiv icon

Explainable Recommender with Geometric Information Bottleneck

May 09, 2023
Hanqi Yan, Lin Gui, Menghan Wang, Kun Zhang, Yulan He

Figure 1 for Explainable Recommender with Geometric Information Bottleneck
Figure 2 for Explainable Recommender with Geometric Information Bottleneck
Figure 3 for Explainable Recommender with Geometric Information Bottleneck
Figure 4 for Explainable Recommender with Geometric Information Bottleneck

Explainable recommender systems can explain their recommendation decisions, enhancing user trust in the systems. Most explainable recommender systems either rely on human-annotated rationales to train models for explanation generation or leverage the attention mechanism to extract important text spans from reviews as explanations. The extracted rationales are often confined to an individual review and may fail to identify the implicit features beyond the review text. To avoid the expensive human annotation process and to generate explanations beyond individual reviews, we propose to incorporate a geometric prior learnt from user-item interactions into a variational network which infers latent factors from user-item reviews. The latent factors from an individual user-item pair can be used for both recommendation and explanation generation, which naturally inherit the global characteristics encoded in the prior knowledge. Experimental results on three e-commerce datasets show that our model significantly improves the interpretability of a variational recommender using the Wasserstein distance while achieving performance comparable to existing content-based recommender systems in terms of recommendation behaviours.

* Under Review 
Viaarxiv icon