Radiology report generation aims to automatically provide clinically meaningful descriptions of radiology images such as MRI and X-ray. Although great success has been achieved in natural scene image captioning tasks, radiology report generation remains challenging and requires prior medical knowledge. In this paper, we propose PromptRRG, a method that utilizes prompt learning to activate a pretrained model and incorporate prior knowledge. Since prompt learning for radiology report generation has not been explored before, we begin with investigating prompt designs and categorise them based on varying levels of knowledge: common, domain-specific and disease-enriched prompts. Additionally, we propose an automatic prompt learning mechanism to alleviate the burden of manual prompt engineering. This is the first work to systematically examine the effectiveness of prompt learning for radiology report generation. Experimental results on the largest radiology report generation benchmark, MIMIC-CXR, demonstrate that our proposed method achieves state-of-the-art performance. Code will be available upon the acceptance.
We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations. We demonstrate a long-range open-domain conversation through iterative "memorization-retrieval-response" cycles. This requires us to carefully design tailored tuning instructions for each distinct stage. The instructions are reconstructed from a collection of public datasets to teach the LLMs to memorize and retrieve past dialogues with structured memos, leading to enhanced consistency when participating in future conversations. We invite experts to manually annotate a test set designed to evaluate the consistency of long-range conversations questions. Experiments on three testing scenarios involving both open-source and API-accessible chatbots at scale verify the efficacy of MemoChat, which outperforms strong baselines. Our codes, data and models are available here: https://github.com/LuJunru/MemoChat.
Given a controversial target such as ``nuclear energy'', argument mining aims to identify the argumentative text from heterogeneous sources. Current approaches focus on exploring better ways of integrating the target-associated semantic information with the argumentative text. Despite their empirical successes, two issues remain unsolved: (i) a target is represented by a word or a phrase, which is insufficient to cover a diverse set of target-related subtopics; (ii) the sentence-level topic information within an argument, which we believe is crucial for argument mining, is ignored. To tackle the above issues, we propose a novel explainable topic-enhanced argument mining approach. Specifically, with the use of the neural topic model and the language model, the target information is augmented by explainable topic representations. Moreover, the sentence-level topic information within the argument is captured by minimizing the distance between its latent topic distribution and its semantic representation through mutual learning. Experiments have been conducted on the benchmark dataset in both the in-target setting and the cross-target setting. Results demonstrate the superiority of the proposed model against the state-of-the-art baselines.
Text classifiers built on Pre-trained Language Models (PLMs) have achieved remarkable progress in various tasks including sentiment analysis, natural language inference, and question-answering. However, the occurrence of uncertain predictions by these classifiers poses a challenge to their reliability when deployed in practical applications. Much effort has been devoted to designing various probes in order to understand what PLMs capture. But few studies have delved into factors influencing PLM-based classifiers' predictive uncertainty. In this paper, we propose a novel framework, called CUE, which aims to interpret uncertainties inherent in the predictions of PLM-based models. In particular, we first map PLM-encoded representations to a latent space via a variational auto-encoder. We then generate text representations by perturbing the latent space which causes fluctuation in predictive uncertainty. By comparing the difference in predictive uncertainty between the perturbed and the original text representations, we are able to identify the latent dimensions responsible for uncertainty and subsequently trace back to the input features that contribute to such uncertainty. Our extensive experiments on four benchmark datasets encompassing linguistic acceptability classification, emotion classification, and natural language inference show the feasibility of our proposed framework. Our source code is available at: https://github.com/lijiazheng99/CUE.
Document-level multi-event extraction aims to extract the structural information from a given document automatically. Most recent approaches usually involve two steps: (1) modeling entity interactions; (2) decoding entity interactions into events. However, such approaches ignore a global view of inter-dependency of multiple events. Moreover, an event is decoded by iteratively merging its related entities as arguments, which might suffer from error propagation and is computationally inefficient. In this paper, we propose an alternative approach for document-level multi-event extraction with event proxy nodes and Hausdorff distance minimization. The event proxy nodes, representing pseudo-events, are able to build connections with other event proxy nodes, essentially capturing global information. The Hausdorff distance makes it possible to compare the similarity between the set of predicted events and the set of ground-truth events. By directly minimizing Hausdorff distance, the model is trained towards the global optimum directly, which improves performance and reduces training time. Experimental results show that our model outperforms previous state-of-the-art method in F1-score on two datasets with only a fraction of training time.
The exceptional performance of pre-trained large language models has revolutionised various applications, but their adoption in production environments is hindered by prohibitive costs and inefficiencies, particularly when utilising long prompts. This paper proposes OverPrompt, an in-context learning method aimed at improving LLM efficiency and performance by processing multiple inputs in parallel. Evaluated across diverse datasets, OverPrompt enhances task efficiency and integrates a diverse range of examples for improved performance. Particularly, it amplifies fact-checking and sentiment analysis tasks when supplemented with contextual information. Synthetic data grouping further enhances performance, suggesting a viable approach for data augmentation.
Assessing student answers and providing valuable feedback is crucial for effective learning, but it can be a time-consuming task. Traditional methods of automating student answer assessment through text classification often suffer from issues such as lack of trustworthiness, transparency, and the ability to provide a rationale for the automated assessment process. These limitations hinder their usefulness in practice. In this paper, we explore using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation under both the zero-shot and few-shot settings. We introduce a critic module which automatically filters incorrect outputs from ChatGPT and utilizes the remaining ChtaGPT outputs as noisy labelled data to fine-tune a smaller language model, enabling it to perform student answer scoring and rationale generation. Moreover, by drawing multiple samples from ChatGPT outputs, we are able to compute predictive confidence scores, which in turn can be used to identify corrupted data and human label errors in the training set. Our experimental results demonstrate that despite being a few orders of magnitude smaller than ChatGPT, the fine-tuned language model achieves better performance in student answer scoring. Furthermore, it generates more detailed and comprehensible assessments than traditional text classification methods. Our approach provides a viable solution to achieve explainable automated assessment in education.
With the development of neural topic models in recent years, topic modelling is playing an increasingly important role in natural language understanding. However, most existing topic models still rely on bag-of-words (BoW) information, either as training input or training target. This limits their ability to capture word order information in documents and causes them to suffer from the out-of-vocabulary (OOV) issue, i.e. they cannot handle unobserved words in new documents. Contextualized word embeddings from pre-trained language models show superiority in the ability of word sense disambiguation and prove to be effective in dealing with OOV words. In this work, we developed a novel neural topic model combining contextualized word embeddings from the pre-trained language model BERT. The model can infer the topic distribution of a document without using any BoW information. In addition, the model can infer the topic distribution of each word in a document directly from the contextualized word embeddings. Experiments on several datasets show that our model outperforms existing topic models in terms of both document classification and topic coherence metrics and can accommodate unseen words from newly arrived documents. Experiments on the NER dataset also show that our model can produce high-quality word topic representations.
Explainable recommender systems can explain their recommendation decisions, enhancing user trust in the systems. Most explainable recommender systems either rely on human-annotated rationales to train models for explanation generation or leverage the attention mechanism to extract important text spans from reviews as explanations. The extracted rationales are often confined to an individual review and may fail to identify the implicit features beyond the review text. To avoid the expensive human annotation process and to generate explanations beyond individual reviews, we propose to incorporate a geometric prior learnt from user-item interactions into a variational network which infers latent factors from user-item reviews. The latent factors from an individual user-item pair can be used for both recommendation and explanation generation, which naturally inherit the global characteristics encoded in the prior knowledge. Experimental results on three e-commerce datasets show that our model significantly improves the interpretability of a variational recommender using the Wasserstein distance while achieving performance comparable to existing content-based recommender systems in terms of recommendation behaviours.