Alert button
Picture for Gabriele Pergola

Gabriele Pergola

Alert button

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

Aug 23, 2023
Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, Yunsheng Wu

Figure 1 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Figure 2 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Figure 3 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Figure 4 for MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations. We demonstrate a long-range open-domain conversation through iterative "memorization-retrieval-response" cycles. This requires us to carefully design tailored tuning instructions for each distinct stage. The instructions are reconstructed from a collection of public datasets to teach the LLMs to memorize and retrieve past dialogues with structured memos, leading to enhanced consistency when participating in future conversations. We invite experts to manually annotate a test set designed to evaluate the consistency of long-range conversations questions. Experiments on three testing scenarios involving both open-source and API-accessible chatbots at scale verify the efficacy of MemoChat, which outperforms strong baselines. Our codes, data and models are available here: https://github.com/LuJunru/MemoChat.

Viaarxiv icon

Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering

May 08, 2023
Junru Lu, Gabriele Pergola, Lin Gui, Yulan He

Figure 1 for Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering
Figure 2 for Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering
Figure 3 for Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering
Figure 4 for Event Knowledge Incorporation with Posterior Regularization for Event-Centric Question Answering

We propose a simple yet effective strategy to incorporate event knowledge extracted from event trigger annotations via posterior regularization to improve the event reasoning capability of mainstream question-answering (QA) models for event-centric QA. In particular, we define event-related knowledge constraints based on the event trigger annotations in the QA datasets, and subsequently use them to regularize the posterior answer output probabilities from the backbone pre-trained language models used in the QA setting. We explore two different posterior regularization strategies for extractive and generative QA separately. For extractive QA, the sentence-level event knowledge constraint is defined by assessing if a sentence contains an answer event or not, which is later used to modify the answer span extraction probability. For generative QA, the token-level event knowledge constraint is defined by comparing the generated token from the backbone language model with the answer event in order to introduce a reward or penalty term, which essentially adjusts the answer generative probability indirectly. We conduct experiments on two event-centric QA datasets, TORQUE and ESTER. The results show that our proposed approach can effectively inject event knowledge into existing pre-trained language models and achieves strong performance compared to existing QA models in answer evaluation. Code and models can be found: https://github.com/LuJunru/EventQAviaPR.

* work in process 
Viaarxiv icon

NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization

Feb 11, 2023
Junru Lu, Jiazheng Li, Byron C. Wallace, Yulan He, Gabriele Pergola

Figure 1 for NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization
Figure 2 for NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization
Figure 3 for NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization
Figure 4 for NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization

Accessing medical literature is difficult for laypeople as the content is written for specialists and contains medical jargon. Automated text simplification methods offer a potential means to address this issue. In this work, we propose a summarize-then-simplify two-stage strategy, which we call NapSS, identifying the relevant content to simplify while ensuring that the original narrative flow is preserved. In this approach, we first generate reference summaries via sentence matching between the original and the simplified abstracts. These summaries are then used to train an extractive summarizer, learning the most relevant content to be simplified. Then, to ensure the narrative consistency of the simplified text, we synthesize auxiliary narrative prompts combining key phrases derived from the syntactical analyses of the original text. Our model achieves results significantly better than the seq2seq baseline on an English medical corpus, yielding 3%~4% absolute improvements in terms of lexical similarity, and providing a further 1.1% improvement of SARI score when combined with the baseline. We also highlight shortcomings of existing evaluation methods, and introduce new metrics that take into account both lexical and high-level semantic similarity. A human evaluation conducted on a random sample of the test set further establishes the effectiveness of the proposed approach. Codes and models are released here: https://github.com/LuJunru/NapSS.

* Findings of EACL 2023 
Viaarxiv icon

Event Temporal Relation Extraction with Bayesian Translational Model

Feb 10, 2023
Xingwei Tan, Gabriele Pergola, Yulan He

Figure 1 for Event Temporal Relation Extraction with Bayesian Translational Model
Figure 2 for Event Temporal Relation Extraction with Bayesian Translational Model
Figure 3 for Event Temporal Relation Extraction with Bayesian Translational Model
Figure 4 for Event Temporal Relation Extraction with Bayesian Translational Model

Existing models to extract temporal relations between events lack a principled method to incorporate external knowledge. In this study, we introduce Bayesian-Trans, a Bayesian learning-based method that models the temporal relation representations as latent variables and infers their values via Bayesian inference and translational functions. Compared to conventional neural approaches, instead of performing point estimation to find the best set parameters, the proposed model infers the parameters' posterior distribution directly, enhancing the model's capability to encode and express uncertainty about the predictions. Experimental results on the three widely used datasets show that Bayesian-Trans outperforms existing approaches for event temporal relation extraction. We additionally present detailed analyses on uncertainty quantification, comparison of priors, and ablation studies, illustrating the benefits of the proposed approach.

* 9 pages + 2 
Viaarxiv icon

Event-Centric Question Answering via Contrastive Learning and Invertible Event Transformation

Oct 24, 2022
Junru Lu, Xingwei Tan, Gabriele Pergola, Lin Gui, Yulan He

Figure 1 for Event-Centric Question Answering via Contrastive Learning and Invertible Event Transformation
Figure 2 for Event-Centric Question Answering via Contrastive Learning and Invertible Event Transformation
Figure 3 for Event-Centric Question Answering via Contrastive Learning and Invertible Event Transformation
Figure 4 for Event-Centric Question Answering via Contrastive Learning and Invertible Event Transformation

Human reading comprehension often requires reasoning of event semantic relations in narratives, represented by Event-centric Question-Answering (QA). To address event-centric QA, we propose a novel QA model with contrastive learning and invertible event transformation, call TranCLR. Our proposed model utilizes an invertible transformation matrix to project semantic vectors of events into a common event embedding space, trained with contrastive learning, and thus naturally inject event semantic knowledge into mainstream QA pipelines. The transformation matrix is fine-tuned with the annotated event relation types between events that occurred in questions and those in answers, using event-aware question vectors. Experimental results on the Event Semantic Relation Reasoning (ESTER) dataset show significant improvements in both generative and extractive settings compared to the existing strong baselines, achieving over 8.4% gain in the token-level F1 score and 3.0% gain in Exact Match (EM) score under the multi-answer setting. Qualitative analysis reveals the high quality of the generated answers by TranCLR, demonstrating the feasibility of injecting event knowledge into QA model learning. Our code and models can be found at https://github.com/LuJunru/TranCLR.

* Findings of EMNLP 2022 
Viaarxiv icon

PHEE: A Dataset for Pharmacovigilance Event Extraction from Text

Oct 22, 2022
Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Byron C. Wallace, Bino John, Nigel Greene, Joseph Kim, Yulan He

Figure 1 for PHEE: A Dataset for Pharmacovigilance Event Extraction from Text
Figure 2 for PHEE: A Dataset for Pharmacovigilance Event Extraction from Text
Figure 3 for PHEE: A Dataset for Pharmacovigilance Event Extraction from Text
Figure 4 for PHEE: A Dataset for Pharmacovigilance Event Extraction from Text

The primary goal of drug safety researchers and regulators is to promptly identify adverse drug reactions. Doing so may in turn prevent or reduce the harm to patients and ultimately improve public health. Evaluating and monitoring drug safety (i.e., pharmacovigilance) involves analyzing an ever growing collection of spontaneous reports from health professionals, physicians, and pharmacists, and information voluntarily submitted by patients. In this scenario, facilitating analysis of such reports via automation has the potential to rapidly identify safety signals. Unfortunately, public resources for developing natural language models for this task are scant. We present PHEE, a novel dataset for pharmacovigilance comprising over 5000 annotated events from medical case reports and biomedical literature, making it the largest such public dataset to date. We describe the hierarchical event schema designed to provide coarse and fine-grained information about patients' demographics, treatments and (side) effects. Along with the discussion of the dataset, we present a thorough experimental evaluation of current state-of-the-art approaches for biomedical event extraction, point out their limitations, and highlight open challenges to foster future research in this area.

* 17 pages, 3 figures, EMNLP2022 accepted 
Viaarxiv icon

Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media

May 06, 2022
Lixing Zhu, Zheng Fang, Gabriele Pergola, Rob Procter, Yulan He

Figure 1 for Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
Figure 2 for Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
Figure 3 for Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media
Figure 4 for Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media

Building models to detect vaccine attitudes on social media is challenging because of the composite, often intricate aspects involved, and the limited availability of annotated data. Existing approaches have relied heavily on supervised training that requires abundant annotations and pre-defined aspect categories. Instead, with the aim of leveraging the large amount of unannotated data now available on vaccination, we propose a novel semi-supervised approach for vaccine attitude detection, called VADet. A variational autoencoding architecture based on language models is employed to learn from unlabelled data the topical information of the domain. Then, the model is fine-tuned with a few manually annotated examples of user attitudes. We validate the effectiveness of VADet on our annotated data and also on an existing vaccination corpus annotated with opinions on vaccines. Our results show that VADet is able to learn disentangled stance and aspect topics, and outperforms existing aspect-based sentiment analysis models on both stance detection and tweet clustering.

Viaarxiv icon

Extracting Event Temporal Relations via Hyperbolic Geometry

Sep 12, 2021
Xingwei Tan, Gabriele Pergola, Yulan He

Figure 1 for Extracting Event Temporal Relations via Hyperbolic Geometry
Figure 2 for Extracting Event Temporal Relations via Hyperbolic Geometry
Figure 3 for Extracting Event Temporal Relations via Hyperbolic Geometry
Figure 4 for Extracting Event Temporal Relations via Hyperbolic Geometry

Detecting events and their evolution through time is a crucial task in natural language understanding. Recent neural approaches to event temporal relation extraction typically map events to embeddings in the Euclidean space and train a classifier to detect temporal relations between event pairs. However, embeddings in the Euclidean space cannot capture richer asymmetric relations such as event temporal relations. We thus propose to embed events into hyperbolic spaces, which are intrinsically oriented at modeling hierarchical structures. We introduce two approaches to encode events and their temporal relations in hyperbolic spaces. One approach leverages hyperbolic embeddings to directly infer event relations through simple geometrical operations. In the second one, we devise an end-to-end architecture composed of hyperbolic neural units tailored for the temporal relation extraction task. Thorough experimental assessments on widely used datasets have shown the benefits of revisiting the tasks on a different geometrical space, resulting in state-of-the-art performance on several standard metrics. Finally, the ablation study and several qualitative analyses highlighted the rich event semantics implicitly encoded into hyperbolic spaces.

* Accepted by EMNLP 2021, 9 pages + 4 pages (References and Appendix) 
Viaarxiv icon

Position Bias Mitigation: A Knowledge-Aware Graph Model for Emotion Cause Extraction

Jun 08, 2021
Hanqi Yan, Lin Gui, Gabriele Pergola, Yulan He

Figure 1 for Position Bias Mitigation: A Knowledge-Aware Graph Model for Emotion Cause Extraction
Figure 2 for Position Bias Mitigation: A Knowledge-Aware Graph Model for Emotion Cause Extraction
Figure 3 for Position Bias Mitigation: A Knowledge-Aware Graph Model for Emotion Cause Extraction
Figure 4 for Position Bias Mitigation: A Knowledge-Aware Graph Model for Emotion Cause Extraction

The Emotion Cause Extraction (ECE)} task aims to identify clauses which contain emotion-evoking information for a particular emotion expressed in text. We observe that a widely-used ECE dataset exhibits a bias that the majority of annotated cause clauses are either directly before their associated emotion clauses or are the emotion clauses themselves. Existing models for ECE tend to explore such relative position information and suffer from the dataset bias. To investigate the degree of reliance of existing ECE models on clause relative positions, we propose a novel strategy to generate adversarial examples in which the relative position information is no longer the indicative feature of cause clauses. We test the performance of existing models on such adversarial examples and observe a significant performance drop. To address the dataset bias, we propose a novel graph-based method to explicitly model the emotion triggering paths by leveraging the commonsense knowledge to enhance the semantic dependencies between a candidate clause and an emotion clause. Experimental results show that our proposed approach performs on par with the existing state-of-the-art methods on the original ECE dataset, and is more robust against adversarial attacks compared to existing models.

* ACL2021 Main Conference Long paper 
Viaarxiv icon