Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Derek Greene

Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation

Apr 21, 2026

Eoghan Cunningham, Derek Greene, James Cross, Antonio Rago

Abstract:Understanding how policy is debated and justified in parliament is a fundamental aspect of the democratic process. However, the volume and complexity of such debates mean that outside audiences struggle to engage. Meanwhile, Large Language Models (LLMs) have been shown to enable automated summarisation at scale. While summaries of debates can make parliamentary procedures more accessible, evaluating whether these summaries faithfully communicate argumentative content remains challenging. Existing automated summarisation metrics have been shown to correlate poorly with human judgements of consistency (i.e., faithfulness or alignment between summary and source). In this work, we propose a formal framework for evaluating parliamentary debate summaries that grounds argument structures in the contested proposals up for debate. Our novel approach, driven by computational argumentation, focuses the evaluation on formal properties concerning the faithful preservation of the reasoning presented to justify or oppose policy outcomes. We demonstrate our methods using a case-study of debates from the European Parliament and associated LLM-driven summaries.

* Accepted at KR'26 In The Wild Track. Camera ready to follow

Via

Access Paper or Ask Questions

Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR

Jan 17, 2026

Suchana Datta, Dwaipayan Roy, Derek Greene, Gerardine Meaney, Karen Wade, Philipp Mayr

Abstract:This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library BL19 digital collection (more than 35,000 works from 1700-1899), we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. We focus on knowledge transfer from fiction to non-fiction, investigating how narrative understanding and semantic richness in fiction can improve retrieval for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.

Via

Access Paper or Ask Questions

PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy

May 28, 2025

Shuhao Guan, Moule Lin, Cheng Xu, Xinyi Liu, Jinman Zhao, Jiexin Fan, Qi Xu, Derek Greene

Abstract:This paper introduces PreP-OCR, a two-stage pipeline that combines document image restoration with semantic-aware post-OCR correction to enhance both visual clarity and textual consistency, thereby improving text extraction from degraded historical documents. First, we synthesize document-image pairs from plaintext, rendering them with diverse fonts and layouts and then applying a randomly ordered set of degradation operations. An image restoration model is trained on this synthetic data, using multi-directional patch extraction and fusion to process large images. Second, a ByT5 post-OCR model, fine-tuned on synthetic historical text pairs, addresses remaining OCR errors. Detailed experiments on 13,831 pages of real historical documents in English, French, and Spanish show that the PreP-OCR pipeline reduces character error rates by 63.9-70.3% compared to OCR on raw images. Our pipeline demonstrates the potential of integrating image restoration with linguistic error correction for digitizing historical archives.

* ACL 2025 main

Via

Access Paper or Ask Questions

Combining Query Performance Predictors: A Reproducibility Study

Mar 31, 2025

Sourav Saha, Suchana Datta, Dwaipayan Roy, Mandar Mitra, Derek Greene

Figure 1 for Combining Query Performance Predictors: A Reproducibility Study

Figure 2 for Combining Query Performance Predictors: A Reproducibility Study

Figure 3 for Combining Query Performance Predictors: A Reproducibility Study

Figure 4 for Combining Query Performance Predictors: A Reproducibility Study

Abstract:A large number of approaches to Query Performance Prediction (QPP) have been proposed over the last two decades. As early as 2009, Hauff et al. [28] explored whether different QPP methods may be combined to improve prediction quality. Since then, significant research has been done both on QPP approaches, as well as their evaluation. This study revisits Hauff et al.s work to assess the reproducibility of their findings in the light of new prediction methods, evaluation metrics, and datasets. We expand the scope of the earlier investigation by: (i) considering post-retrieval methods, including supervised neural techniques (only pre-retrieval techniques were studied in [28]); (ii) using sMARE for evaluation, in addition to the traditional correlation coefficients and RMSE; and (iii) experimenting with additional datasets (Clueweb09B and TREC DL). Our results largely support previous claims, but we also present several interesting findings. We interpret these findings by taking a more nuanced look at the correlation between QPP methods, examining whether they capture diverse information or rely on overlapping factors.

Via

Access Paper or Ask Questions

Unveiling Temporal Trends in 19th Century Literature: An Information Retrieval Approach

Jan 12, 2025

Suchana Datta, Dwaipayan Roy, Derek Greene, Gerardine Meaney

Figure 1 for Unveiling Temporal Trends in 19th Century Literature: An Information Retrieval Approach

Figure 2 for Unveiling Temporal Trends in 19th Century Literature: An Information Retrieval Approach

Figure 3 for Unveiling Temporal Trends in 19th Century Literature: An Information Retrieval Approach

Figure 4 for Unveiling Temporal Trends in 19th Century Literature: An Information Retrieval Approach

Abstract:In English literature, the 19th century witnessed a significant transition in styles, themes, and genres. Consequently, the novels from this period display remarkable diversity. This paper explores these variations by examining the evolution of term usage in 19th century English novels through the lens of information retrieval. By applying a query expansion-based approach to a decade-segmented collection of fiction from the British Library, we examine how related terms vary over time. Our analysis employs multiple standard metrics including Kendall's tau, Jaccard similarity, and Jensen-Shannon divergence to assess overlaps and shifts in expanded query term sets. Our results indicate a significant degree of divergence in the related terms across decades as selected by the query expansion technique, suggesting substantial linguistic and conceptual changes throughout the 19th century novels.

* Accepted at JCDL 2024

Via

Access Paper or Ask Questions

Transformers4NewsRec: A Transformer-based News Recommendation Framework

Oct 17, 2024

Dairui Liu, Honghui Du, Boming Yang, Neil Hurley, Aonghus Lawlor, Irene Li, Derek Greene, Ruihai Dong

Figure 1 for Transformers4NewsRec: A Transformer-based News Recommendation Framework

Figure 2 for Transformers4NewsRec: A Transformer-based News Recommendation Framework

Figure 3 for Transformers4NewsRec: A Transformer-based News Recommendation Framework

Figure 4 for Transformers4NewsRec: A Transformer-based News Recommendation Framework

Abstract:Pre-trained transformer models have shown great promise in various natural language processing tasks, including personalized news recommendations. To harness the power of these models, we introduce Transformers4NewsRec, a new Python framework built on the \textbf{Transformers} library. This framework is designed to unify and compare the performance of various news recommendation models, including deep neural networks and graph-based models. Transformers4NewsRec offers flexibility in terms of model selection, data preprocessing, and evaluation, allowing both quantitative and qualitative analysis.

Via

Access Paper or Ask Questions

Advancing Post-OCR Correction: A Comparative Study of Synthetic Data

Aug 05, 2024

Shuhao Guan, Derek Greene

Abstract:This paper explores the application of synthetic data in the post-OCR domain on multiple fronts by conducting experiments to assess the impact of data volume, augmentation, and synthetic data generation methods on model performance. Furthermore, we introduce a novel algorithm that leverages computer vision feature detection algorithms to calculate glyph similarity for constructing post-OCR synthetic data. Through experiments conducted across a variety of languages, including several low-resource ones, we demonstrate that models like ByT5 can significantly reduce Character Error Rates (CER) without the need for manually annotated data, and our proposed synthetic data generation method shows advantages over traditional methods, particularly in low-resource languages.

* ACL 2024 findings

Via

Access Paper or Ask Questions

Benchmark Data Contamination of Large Language Models: A Survey

Jun 06, 2024

Cheng Xu, Shuhao Guan, Derek Greene, M-Tahar Kechadi

Abstract:The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable performance during the evaluation phase of the process. This paper reviews the complex challenge of BDC in LLM evaluation and explores alternative assessment methods to mitigate the risks associated with traditional benchmarks. The paper also examines challenges and future directions in mitigating BDC risks, highlighting the complexity of the issue and the need for innovative solutions to ensure the reliability of LLM evaluation in real-world applications.

* 31 pages, 7 figures, 3 tables

Via

Access Paper or Ask Questions

A Deep Learning Approach for Selective Relevance Feedback

Jan 20, 2024

Suchana Datta, Debasis Ganguly, Sean MacAvaney, Derek Greene

Figure 1 for A Deep Learning Approach for Selective Relevance Feedback

Figure 2 for A Deep Learning Approach for Selective Relevance Feedback

Figure 3 for A Deep Learning Approach for Selective Relevance Feedback

Figure 4 for A Deep Learning Approach for Selective Relevance Feedback

Abstract:Pseudo-relevance feedback (PRF) can enhance average retrieval effectiveness over a sufficiently large number of queries. However, PRF often introduces a drift into the original information need, thus hurting the retrieval effectiveness of several queries. While a selective application of PRF can potentially alleviate this issue, previous approaches have largely relied on unsupervised or feature-based learning to determine whether a query should be expanded. In contrast, we revisit the problem of selective PRF from a deep learning perspective, presenting a model that is entirely data-driven and trained in an end-to-end manner. The proposed model leverages a transformer-based bi-encoder architecture. Additionally, to further improve retrieval effectiveness with this selective PRF approach, we make use of the model's confidence estimates to combine the information from the original and expanded queries. In our experiments, we apply this selective feedback on a number of different combinations of ranking and feedback models, and show that our proposed approach consistently improves retrieval effectiveness for both sparse and dense ranking models, with the feedback models being either sparse, dense or generative.

Via

Access Paper or Ask Questions

RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models

Dec 16, 2023

Dairui Liu, Boming Yang, Honghui Du, Derek Greene, Aonghus Lawlor, Ruihai Dong, Irene Li

Figure 1 for RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models

Figure 2 for RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models

Figure 3 for RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models

Figure 4 for RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models

Abstract:In the evolving field of personalized news recommendation, understanding the semantics of the underlying data is crucial. Large Language Models (LLMs) like GPT-4 have shown promising performance in understanding natural language. However, the extent of their applicability in news recommendation systems remains to be validated. This paper introduces RecPrompt, the first framework for news recommendation that leverages the capabilities of LLMs through prompt engineering. This system incorporates a prompt optimizer that applies an iterative bootstrapping process, enhancing the LLM-based recommender's ability to align news content with user preferences and interests more effectively. Moreover, this study offers insights into the effective use of LLMs in news recommendation, emphasizing both the advantages and the challenges of incorporating LLMs into recommendation systems.

* 8 pages, 3 figures, and 8 tables

Via

Access Paper or Ask Questions