Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mahdi Dhaini

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Sep 04, 2025

Juraj Vladika, Mahdi Dhaini, Florian Matthes

Abstract:The growing capabilities of Large Language Models (LLMs) show significant potential to enhance healthcare by assisting medical researchers and physicians. However, their reliance on static training data is a major risk when medical recommendations evolve with new research and developments. When LLMs memorize outdated medical knowledge, they can provide harmful advice or fail at clinical reasoning tasks. To investigate this problem, we introduce two novel question-answering (QA) datasets derived from systematic reviews: MedRevQA (16,501 QA pairs covering general biomedical knowledge) and MedChangeQA (a subset of 512 QA pairs where medical consensus has changed over time). Our evaluation of eight prominent LLMs on the datasets reveals consistent reliance on outdated knowledge across all models. We additionally analyze the influence of obsolete pre-training data and training strategies to explain this phenomenon and propose future directions for mitigation, laying the groundwork for developing more current and reliable medical AI systems.

* Accepted to Findings of EMNLP 2025

Via

Access Paper or Ask Questions

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

May 02, 2025

Mahdi Dhaini, Ege Erdogan, Nils Feldhus, Gjergji Kasneci

Figure 1 for Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

Figure 2 for Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

Figure 3 for Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

Figure 4 for Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

Abstract:While research on applications and evaluations of explanation methods continues to expand, fairness of the explanation methods concerning disparities in their performance across subgroups remains an often overlooked aspect. In this paper, we address this gap by showing that, across three tasks and five language models, widely used post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity. These disparities persist even when the models are pre-trained or fine-tuned on particularly unbiased datasets, indicating that the disparities we observe are not merely consequences of biased training data. Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods, as these can lead to biased outcomes against certain subgroups, with particularly critical implications in high-stakes contexts. Furthermore, our findings underscore the importance of incorporating the fairness of explanations, alongside overall model fairness and explainability, as a requirement in regulatory frameworks.

* Accepted to ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2025

Via

Access Paper or Ask Questions

EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models

May 02, 2025

Mahdi Dhaini, Kafaite Zahra Hussain, Efstratios Zaradoukas, Gjergji Kasneci

Abstract:As Natural Language Processing (NLP) models continue to evolve and become integral to high-stakes applications, ensuring their interpretability remains a critical challenge. Given the growing variety of explainability methods and diverse stakeholder requirements, frameworks that help stakeholders select appropriate explanations tailored to their specific use cases are increasingly important. To address this need, we introduce EvalxNLP, a Python framework for benchmarking state-of-the-art feature attribution methods for transformer-based NLP models. EvalxNLP integrates eight widely recognized explainability techniques from the Explainable AI (XAI) literature, enabling users to generate and evaluate explanations based on key properties such as faithfulness, plausibility, and complexity. Our framework also provides interactive, LLM-based textual explanations, facilitating user understanding of the generated explanations and evaluation outcomes. Human evaluation results indicate high user satisfaction with EvalxNLP, suggesting it is a promising framework for benchmarking explanation methods across diverse user groups. By offering a user-friendly and extensible platform, EvalxNLP aims at democratizing explainability tools and supporting the systematic comparison and advancement of XAI techniques in NLP.

* Accepted to the xAI World Conference (2025) - System Demonstration

Via

Access Paper or Ask Questions

Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated Text

Sep 14, 2023

Mahdi Dhaini, Wessel Poelman, Ege Erdogan

Figure 1 for Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated Text

Figure 2 for Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated Text

Abstract:While recent advancements in the capabilities and widespread accessibility of generative language models, such as ChatGPT (OpenAI, 2022), have brought about various benefits by generating fluent human-like text, the task of distinguishing between human- and large language model (LLM) generated text has emerged as a crucial problem. These models can potentially deceive by generating artificial text that appears to be human-generated. This issue is particularly significant in domains such as law, education, and science, where ensuring the integrity of text is of the utmost importance. This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT. We present an account of the different datasets constructed for detecting ChatGPT-generated text, the various methods utilized, what qualitative analyses into the characteristics of human versus ChatGPT-generated text have been performed, and finally, summarize our findings into general insights

* Published in the Proceedings of the Student Research Workshop associated with RANLP-2023

Via

Access Paper or Ask Questions