Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yulan He

ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Sep 09, 2024

Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Yulan He

Figure 1 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Figure 2 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Figure 3 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Figure 4 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Abstract:Predicting unknown drug-drug interactions (DDIs) is crucial for improving medication safety. Previous efforts in DDI prediction have typically focused on binary classification or predicting DDI categories, with the absence of explanatory insights that could enhance trust in these predictions. In this work, we propose to generate natural language explanations for DDI predictions, enabling the model to reveal the underlying pharmacodynamics and pharmacokinetics mechanisms simultaneously as making the prediction. To do this, we have collected DDI explanations from DDInter and DrugBank and developed various models for extensive experiments and analysis. Our models can provide accurate explanations for unknown DDIs between known drugs. This paper contributes new tools to the field of DDI prediction and lays a solid foundation for further research on generating explanations for DDI predictions.

* 17 pages, 4 figures

Via

Access Paper or Ask Questions

Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Jun 28, 2024

Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He

Figure 1 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Figure 2 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Figure 3 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Figure 4 for Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Abstract:Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with classifier-based black-box scoring systems. We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree. We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data. Finally, we utilise the generated synthetic data to calibrate LLMs through a two-step training process: supervised fine-tuning and preference optimization. Extensive experimental results demonstrate that our framework achieves a 38% assessment performance improvement in the QWK score compared to prior work while producing higher-quality rationales, as recognised by human evaluators and LLMs. Our work sheds light on the effectiveness of performing preference optimization using synthetic preference data obtained from thought tree paths.

Via

Access Paper or Ask Questions

Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems

Jun 27, 2024

Italo Luis da Silva, Hanqi Yan, Lin Gui, Yulan He

Figure 1 for Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems

Figure 2 for Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems

Figure 3 for Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems

Figure 4 for Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems

Abstract:The inherent ambiguity of cause and effect boundaries poses a challenge in evaluating causal event extraction tasks. Traditional metrics like Exact Match and BertScore poorly reflect model performance, so we trained evaluation models to approximate human evaluation, achieving high agreement. We used them to perform Reinforcement Learning with extraction models to align them with human preference, prioritising semantic understanding. We successfully explored our approach through multiple datasets, including transferring an evaluator trained on one dataset to another as a way to decrease the reliance on human-annotated data. In that vein, we also propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model while still achieving high performance in training an RL model. Our code is available at https://github.com/oyarsa/event_extraction/tree/causal-event-extraction.

* 13 pages, 6 figures, 6 tables

Via

Access Paper or Ask Questions

Cascading Large Language Models for Salient Event Graph Generation

Jun 26, 2024

Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

Figure 1 for Cascading Large Language Models for Salient Event Graph Generation

Figure 2 for Cascading Large Language Models for Salient Event Graph Generation

Figure 3 for Cascading Large Language Models for Salient Event Graph Generation

Figure 4 for Cascading Large Language Models for Salient Event Graph Generation

Abstract:Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. We first identify salient events by prompting LLMs to generate summaries, from which salient events are identified. Next, we develop an iterative code refinement prompting strategy to generate event relation graphs, removing hallucinated relations and recovering missing edges. Fine-tuning contextualised graph generation models on the LLM-generated graphs outperforms the models trained on CAEVO-generated data. Experimental results on a human-annotated test set show that the proposed method generates salient and more accurate graphs, outperforming competitive baselines.

* 9 + 12 pages

Via

Access Paper or Ask Questions

Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

Jun 25, 2024

Hanqi Yan, Yanzheng Xiang, Guangyi Chen, Yifei Wang, Lin Gui, Yulan He

Figure 1 for Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

Figure 2 for Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

Figure 3 for Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

Figure 4 for Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective

Abstract:To better interpret the intrinsic mechanism of large language models (LLMs), recent studies focus on monosemanticity on its basic units. A monosemantic neuron is dedicated to a single and specific concept, which forms a one-to-one correlation between neurons and concepts. Despite extensive research in monosemanticity probing, it remains unclear whether monosemanticity is beneficial or harmful to model capacity. To explore this question, we revisit monosemanticity from the feature decorrelation perspective and advocate for its encouragement. We experimentally observe that the current conclusion by wang2024learning, which suggests that decreasing monosemanticity enhances model performance, does not hold when the model changes. Instead, we demonstrate that monosemanticity consistently exhibits a positive correlation with model capacity, in the preference alignment process. Consequently, we apply feature correlation as a proxy for monosemanticity and incorporate a feature decorrelation regularizer into the dynamic preference optimization process. The experiments show that our method not only enhances representation diversity and activation sparsity but also improves preference alignment performance.

Via

Access Paper or Ask Questions

DrugWatch: A Comprehensive Multi-Source Data Visualisation Platform for Drug Safety Information

Jun 18, 2024

Artem Bobrov, Domantas Saltenis, Zhaoyue Sun, Gabriele Pergola, Yulan He

Abstract:Drug safety research is crucial for maintaining public health, often requiring comprehensive data support. However, the resources currently available to the public are limited and fail to provide a comprehensive understanding of the relationship between drugs and their side effects. This paper introduces DrugWatch, an easy-to-use and interactive multi-source information visualisation platform for drug safety study. It allows users to understand common side effects of drugs and their statistical information, flexibly retrieve relevant medical reports, or annotate their own medical texts with our automated annotation tool. Supported by NLP technology and enriched with interactive visual components, we are committed to providing researchers and practitioners with a one-stop information analysis, retrieval, and annotation service. The demonstration video is available at https://www.youtube.com/watch?v=RTqDgxzETjw. We also deployed an online demonstration system at https://drugwatch.net/.

* 10 pages, 14 figures, accepted by ACL 2024 Demo Track

Via

Access Paper or Ask Questions

Multi-Layer Ranking with Large Language Models for News Source Recommendation

Jun 17, 2024

Wenjia Zhang, Lin Gui, Rob Procter, Yulan He

Abstract:To seek reliable information sources for news events, we introduce a novel task of expert recommendation, which aims to identify trustworthy sources based on their previously quoted statements. To achieve this, we built a novel dataset, called NewsQuote, consisting of 23,571 quote-speaker pairs sourced from a collection of news articles. We formulate the recommendation task as the retrieval of experts based on their likelihood of being associated with a given query. We also propose a multi-layer ranking framework employing Large Language Models to improve the recommendation performance. Our results show that employing an in-context learning based LLM ranker and a multi-layer ranking-based filter significantly improve both the predictive quality and behavioural quality of the recommender system.

* Accepted by the SIGIR 2024. arXiv admin note: text overlap with arXiv:2305.04825

Via

Access Paper or Ask Questions

Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

Jun 16, 2024

Junru Lu, Jiazheng Li, Siyu An, Meng Zhao, Yulan He, Di Yin, Xing Sun

Abstract:Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning from Human Feedback (RLHF). Despite its promising efficacy, DPO faces a notable drawback: "verbosity", a common over-optimization phenomenon also observed in RLHF. While previous studies mainly attributed verbosity to biased labels within the data, we propose that the issue also stems from an inherent algorithmic length reliance in DPO. Specifically, we suggest that the discrepancy between sequence-level Kullback-Leibler (KL) divergences between chosen and rejected sequences, used in DPO, results in overestimated or underestimated rewards due to varying token lengths. Empirically, we utilize datasets with different label lengths to demonstrate the presence of biased rewards. We then introduce an effective downsampling approach, named SamPO, to eliminate potential length reliance. Our experimental evaluations, conducted across three LLMs of varying scales and a diverse array of conditional and open-ended benchmarks, highlight the efficacy of SamPO in mitigating verbosity, achieving improvements of 5% to 12% over DPO through debaised rewards. Our codes can be accessed at: https://github.com/LuJunru/SamPO/.

Via

Access Paper or Ask Questions

**PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games**

Apr 26, 2024

Qinglin Zhu, Runcong Zhao, Jinhua Du, Lin Gui, Yulan He

Abstract:Recent advancements in Large Language Models (LLMs) have enhanced the efficacy of agent communication and social interactions. Despite these advancements, building LLM-based agents for reasoning in dynamic environments involving competition and collaboration remains challenging due to the limitations of informed graph-based search methods. We propose PLAYER*, a novel framework based on an anytime sampling-based planner, which utilises sensors and pruners to enable a purely question-driven searching framework for complex reasoning tasks. We also introduce a quantifiable evaluation method using multiple-choice questions and construct the WellPlay dataset with 1,482 QA pairs. Experiments demonstrate PLAYER*'s efficiency and performance enhancements compared to existing methods in complex, dynamic environments with quantifiable results.

Via

Access Paper or Ask Questions

Can We Catch the Elephant? The Evolvement of Hallucination Evaluation on Natural Language Generation: A Survey

Apr 18, 2024

Siya Qi, Yulan He, Zheng Yuan

Abstract:Hallucination in Natural Language Generation (NLG) is like the elephant in the room, obvious but often overlooked until recent achievements significantly improved the fluency and grammatical accuracy of generated text. For Large Language Models (LLMs), hallucinations can happen in various downstream tasks and casual conversations, which need accurate assessment to enhance reliability and safety. However, current studies on hallucination evaluation vary greatly, and people still find it difficult to sort out and select the most appropriate evaluation methods. Moreover, as NLP research gradually shifts to the domain of LLMs, it brings new challenges to this direction. This paper provides a comprehensive survey on the evolvement of hallucination evaluation methods, aiming to address three key aspects: 1) Diverse definitions and granularity of facts; 2) The categories of automatic evaluators and their applicability; 3) Unresolved issues and future directions.

* 19 pages in total, with 9 pages as main body. Under review as a conference paper at CoLM 2024

Via

Access Paper or Ask Questions