Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gabriele Pergola

Explaining Matters: Leveraging Definitions and Semantic Expansion for Sexism Detection

Jun 06, 2025

Sahrish Khan, Arshad Jhumka, Gabriele Pergola

Abstract:The detection of sexism in online content remains an open problem, as harmful language disproportionately affects women and marginalized groups. While automated systems for sexism detection have been developed, they still face two key challenges: data sparsity and the nuanced nature of sexist language. Even in large, well-curated datasets like the Explainable Detection of Online Sexism (EDOS), severe class imbalance hinders model generalization. Additionally, the overlapping and ambiguous boundaries of fine-grained categories introduce substantial annotator disagreement, reflecting the difficulty of interpreting nuanced expressions of sexism. To address these challenges, we propose two prompt-based data augmentation techniques: Definition-based Data Augmentation (DDA), which leverages category-specific definitions to generate semantically-aligned synthetic examples, and Contextual Semantic Expansion (CSE), which targets systematic model errors by enriching examples with task-specific semantic features. To further improve reliability in fine-grained classification, we introduce an ensemble strategy that resolves prediction ties by aggregating complementary perspectives from multiple language models. Our experimental evaluation on the EDOS dataset demonstrates state-of-the-art performance across all tasks, with notable improvements of macro F1 by 1.5 points for binary classification (Task A) and 4.1 points for fine-grained classification (Task C).

* Proceedings of the 2025 Annual Meeting of the Association for Computational Linguistics (ACL). ACL 2025 - Main Conference

Via

Access Paper or Ask Questions

SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations

Mar 09, 2025

Xingwei Tan, Chen Lyu, Hafiz Muhammad Umer, Sahrish Khan, Mahathi Parvatham, Lois Arthurs, Simon Cullen, Shelley Wilson, Arshad Jhumka, Gabriele Pergola

Abstract:Detecting toxic language including sexism, harassment and abusive behaviour, remains a critical challenge, particularly in its subtle and context-dependent forms. Existing approaches largely focus on isolated message-level classification, overlooking toxicity that emerges across conversational contexts. To promote and enable future research in this direction, we introduce SafeSpeech, a comprehensive platform for toxic content detection and analysis that bridges message-level and conversation-level insights. The platform integrates fine-tuned classifiers and large language models (LLMs) to enable multi-granularity detection, toxic-aware conversation summarization, and persona profiling. SafeSpeech also incorporates explainability mechanisms, such as perplexity gain analysis, to highlight the linguistic elements driving predictions. Evaluations on benchmark datasets, including EDOS, OffensEval, and HatEval, demonstrate the reproduction of state-of-the-art performance across multiple tasks, including fine-grained sexism detection.

* NAACL 2025 system demonstration camera-ready

Via

Access Paper or Ask Questions

SciGisPy: a Novel Metric for Biomedical Text Simplification via Gist Inference Score

Oct 12, 2024

Chen Lyu, Gabriele Pergola

Abstract:Biomedical literature is often written in highly specialized language, posing significant comprehension challenges for non-experts. Automatic text simplification (ATS) offers a solution by making such texts more accessible while preserving critical information. However, evaluating ATS for biomedical texts is still challenging due to the limitations of existing evaluation metrics. General-domain metrics like SARI, BLEU, and ROUGE focus on surface-level text features, and readability metrics like FKGL and ARI fail to account for domain-specific terminology or assess how well the simplified text conveys core meanings (gist). To address this, we introduce SciGisPy, a novel evaluation metric inspired by Gist Inference Score (GIS) from Fuzzy-Trace Theory (FTT). SciGisPy measures how well a simplified text facilitates the formation of abstract inferences (gist) necessary for comprehension, especially in the biomedical domain. We revise GIS for this purpose by introducing domain-specific enhancements, including semantic chunking, Information Content (IC) theory, and specialized embeddings, while removing unsuitable indexes. Our experimental evaluation on the Cochrane biomedical text simplification dataset demonstrates that SciGisPy outperforms the original GIS formulation, with a significant increase in correctly identified simplified texts (84% versus 44.8%). The results and a thorough ablation study confirm that SciGisPy better captures the essential meaning of biomedical content, outperforming existing approaches.

* Accepted by he Third Workshop on Text Simplification, Accessibility and Readability

Via

Access Paper or Ask Questions

Society of Medical Simplifiers

Oct 12, 2024

Chen Lyu, Gabriele Pergola

Figure 1 for Society of Medical Simplifiers

Figure 2 for Society of Medical Simplifiers

Figure 3 for Society of Medical Simplifiers

Figure 4 for Society of Medical Simplifiers

Abstract:Medical text simplification is crucial for making complex biomedical literature more accessible to non-experts. Traditional methods struggle with the specialized terms and jargon of medical texts, lacking the flexibility to adapt the simplification process dynamically. In contrast, recent advancements in large language models (LLMs) present unique opportunities by offering enhanced control over text simplification through iterative refinement and collaboration between specialized agents. In this work, we introduce the Society of Medical Simplifiers, a novel LLM-based framework inspired by the "Society of Mind" (SOM) philosophy. Our approach leverages the strengths of LLMs by assigning five distinct roles, i.e., Layperson, Simplifier, Medical Expert, Language Clarifier, and Redundancy Checker, organized into interaction loops. This structure allows the agents to progressively improve text simplification while maintaining the complexity and accuracy of the original content. Evaluations on the Cochrane text simplification dataset demonstrate that our framework is on par with or outperforms state-of-the-art methods, achieving superior readability and content preservation through controlled simplification processes.

* Accepted by Third Workshop on Text Simplification, Accessibility and Readability

Via

Access Paper or Ask Questions

ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Sep 09, 2024

Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Yulan He

Figure 1 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Figure 2 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Figure 3 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Figure 4 for ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language

Abstract:Predicting unknown drug-drug interactions (DDIs) is crucial for improving medication safety. Previous efforts in DDI prediction have typically focused on binary classification or predicting DDI categories, with the absence of explanatory insights that could enhance trust in these predictions. In this work, we propose to generate natural language explanations for DDI predictions, enabling the model to reveal the underlying pharmacodynamics and pharmacokinetics mechanisms simultaneously as making the prediction. To do this, we have collected DDI explanations from DDInter and DrugBank and developed various models for extensive experiments and analysis. Our models can provide accurate explanations for unknown DDIs between known drugs. This paper contributes new tools to the field of DDI prediction and lays a solid foundation for further research on generating explanations for DDI predictions.

* 17 pages, 4 figures

Via

Access Paper or Ask Questions

Cascading Large Language Models for Salient Event Graph Generation

Jun 26, 2024

Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

Figure 1 for Cascading Large Language Models for Salient Event Graph Generation

Figure 2 for Cascading Large Language Models for Salient Event Graph Generation

Figure 3 for Cascading Large Language Models for Salient Event Graph Generation

Figure 4 for Cascading Large Language Models for Salient Event Graph Generation

Abstract:Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. We first identify salient events by prompting LLMs to generate summaries, from which salient events are identified. Next, we develop an iterative code refinement prompting strategy to generate event relation graphs, removing hallucinated relations and recovering missing edges. Fine-tuning contextualised graph generation models on the LLM-generated graphs outperforms the models trained on CAEVO-generated data. Experimental results on a human-annotated test set show that the proposed method generates salient and more accurate graphs, outperforming competitive baselines.

* 9 + 12 pages

Via

Access Paper or Ask Questions

DrugWatch: A Comprehensive Multi-Source Data Visualisation Platform for Drug Safety Information

Jun 18, 2024

Artem Bobrov, Domantas Saltenis, Zhaoyue Sun, Gabriele Pergola, Yulan He

Abstract:Drug safety research is crucial for maintaining public health, often requiring comprehensive data support. However, the resources currently available to the public are limited and fail to provide a comprehensive understanding of the relationship between drugs and their side effects. This paper introduces DrugWatch, an easy-to-use and interactive multi-source information visualisation platform for drug safety study. It allows users to understand common side effects of drugs and their statistical information, flexibly retrieve relevant medical reports, or annotate their own medical texts with our automated annotation tool. Supported by NLP technology and enriched with interactive visual components, we are committed to providing researchers and practitioners with a one-stop information analysis, retrieval, and annotation service. The demonstration video is available at https://www.youtube.com/watch?v=RTqDgxzETjw. We also deployed an online demonstration system at https://drugwatch.net/.

* 10 pages, 14 figures, accepted by ACL 2024 Demo Track

Via

Access Paper or Ask Questions

Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

May 03, 2024

Ethar Alzaid, Gabriele Pergola, Harriet Evans, David Snead, Fayyaz Minhas

Figure 1 for Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

Figure 2 for Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

Figure 3 for Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

Figure 4 for Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance

Abstract:Pathology reports are rich in clinical and pathological details but are often presented in free-text format. The unstructured nature of these reports presents a significant challenge limiting the accessibility of their content. In this work, we present a practical approach based on the use of large multimodal models (LMMs) for automatically extracting information from scanned images of pathology reports with the goal of generating a standardised report specifying the value of different fields along with estimated confidence about the accuracy of the extracted fields. The proposed approach overcomes limitations of existing methods which do not assign confidence scores to extracted fields limiting their practical use. The proposed framework uses two stages of prompting a Large Multimodal Model (LMM) for information extraction and validation. The framework generalises to textual reports from multiple medical centres as well as scanned images of legacy pathology reports. We show that the estimated confidence is an effective indicator of the accuracy of the extracted information that can be used to select only accurately extracted fields. We also show the prognostic significance of structured and unstructured data from pathology reports and show that the automatically extracted field values significant prognostic value for patient stratification. The framework is available for evaluation via the URL: https://labieb.dcs.warwick.ac.uk/.

* 19 pages, 6 figures

Via

Access Paper or Ask Questions

Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

Apr 01, 2024

Xingwei Tan, Yuxiang Zhou, Gabriele Pergola, Yulan He

Figure 1 for Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

Figure 2 for Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

Figure 3 for Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

Figure 4 for Set-Aligning Framework for Auto-Regressive Event Temporal Graph Generation

Abstract:Event temporal graphs have been shown as convenient and effective representations of complex temporal relations between events in text. Recent studies, which employ pre-trained language models to auto-regressively generate linearised graphs for constructing event temporal graphs, have shown promising results. However, these methods have often led to suboptimal graph generation as the linearised graphs exhibit set characteristics which are instead treated sequentially by language models. This discrepancy stems from the conventional text generation objectives, leading to erroneous penalisation of correct predictions caused by the misalignment of elements in target sequences. To address these challenges, we reframe the task as a conditional set generation problem, proposing a Set-aligning Framework tailored for the effective utilisation of Large Language Models (LLMs). The framework incorporates data augmentations and set-property regularisations designed to alleviate text generation loss penalties associated with the linearised graph edge sequences, thus encouraging the generation of more relation edges. Experimental results show that our framework surpasses existing baselines for event temporal graph generation. Furthermore, under zero-shot settings, the structural knowledge introduced through our framework notably improves model generalisation, particularly when the training examples available are limited.

* Accepted to NAACL 2024. 9 + 10 pages

Via

Access Paper or Ask Questions

Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study

Feb 24, 2024

Zhaoyue Sun, Gabriele Pergola, Byron C. Wallace, Yulan He

Abstract:With the advent of large language models (LLMs), there has been growing interest in exploring their potential for medical applications. This research aims to investigate the ability of LLMs, specifically ChatGPT, in the context of pharmacovigilance event extraction, of which the main goal is to identify and extract adverse events or potential therapeutic events from textual medical sources. We conduct extensive experiments to assess the performance of ChatGPT in the pharmacovigilance event extraction task, employing various prompts and demonstration selection strategies. The findings demonstrate that while ChatGPT demonstrates reasonable performance with appropriate demonstration selection strategies, it still falls short compared to fully fine-tuned small models. Additionally, we explore the potential of leveraging ChatGPT for data augmentation. However, our investigation reveals that the inclusion of synthesized data into fine-tuning may lead to a decrease in performance, possibly attributed to noise in the ChatGPT-generated labels. To mitigate this, we explore different filtering strategies and find that, with the proper approach, more stable performance can be achieved, although constant improvement remains elusive.

* 14 pages, 2 figures, accepted by EACL 2024

Via

Access Paper or Ask Questions