Abstract:Opinion and multi-document summarisation often involve genuinely conflicting viewpoints, yet many existing approaches, particularly LLM-based systems, implicitly smooth disagreement and over-represent majority opinions. This limits the faithfulness of generated summaries in opinion-heavy settings. We introduce a disagreement-aware synthesis pipeline that separates belief-level aggregation from language generation. Documents are first represented as structured belief sets and aggregated using distance-based belief merging operators that explicitly model conflict. Large language models are then used only to realise the aggregated beliefs as natural language summaries. We evaluate the approach across multiple model families and scales, comparing it to methods that perform explicit aggregation during generation. Our results show that while sufficiently large models can match belief-level aggregation when aggregation is handled at generation time, this behaviour is not stable across architectures or capacities. In contrast, belief-level aggregation combined with simple prompting yields consistently strong disagreement-aware performance across models, while maintaining fluent and grounded summaries.
Abstract:Clinical interventions often hinge on age: medications and procedures safe for adults may be harmful to children or ineffective for older adults. However, as language models are increasingly integrated into biomedical evidence synthesis workflows, it remains uncertain whether these systems preserve such crucial demographic distinctions. To address this gap, we evaluate how well state-of-the-art language models retain age-related information when generating abstractive summaries of biomedical studies. We construct DemogSummary, a novel age-stratified dataset of systematic review primary studies, covering child, adult, and older adult populations. We evaluate three prominent summarisation-capable LLMs, Qwen (open-source), Longformer (open-source) and GPT-4.1 Nano (proprietary), using both standard metrics and a newly proposed Demographic Salience Score (DSS), which quantifies age-related entity retention and hallucination. Our results reveal systematic disparities across models and age groups: demographic fidelity is lowest for adult-focused summaries, and under-represented populations are more prone to hallucinations. These findings highlight the limitations of current LLMs in faithful and bias-free summarisation and point to the need for fairness-aware evaluation frameworks and summarisation pipelines in biomedical NLP.
Abstract:AI models are rapidly becoming embedded in all aspects of nuclear energy research and work but the safety, security, and safeguards consequences of this embedding are not well understood. In this paper, we call for the creation of an anticipatory system of governance for AI in the nuclear sector as well as the creation of a global AI observatory as a means for operationalizing anticipatory governance. The paper explores the contours of the nuclear AI observatory and an anticipatory system of governance by drawing on work in science and technology studies, public policy, and foresight studies.
Abstract:Recent research has highlighted the potential of linking predictive and prescriptive analytics. However, it remains widely unexplored how both paradigms could benefit from one another to address today's major challenges in healthcare. One of these is smarter planning of resource capacities for frail and elderly inpatient wards, addressing the societal challenge of an aging population. Frail and elderly patients typically suffer from multimorbidity and require more care while receiving medical treatment. The aim of this research is to assess how various predictive and prescriptive analytical methods, both individually and in tandem, contribute to addressing the operational challenges within an area of healthcare that is growing in demand. Clinical and demographic patient attributes are gathered from more than 165,000 patient records and used to explain and predict length of stay. To that extent, we employ Classification and Regression Trees (CART) analysis to establish this relationship. On the prescriptive side, deterministic and two-stage stochastic programs are developed to determine how to optimally plan for beds and ward staff with the objective to minimize cost. Furthermore, the two analytical methodologies are linked by generating demand for the prescriptive models using the CART groupings. The results show the linked methodologies provided different but similar results compared to using averages and in doing so, captured a more realistic real-world variation in the patient length of stay. Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions. By combining predictive and prescriptive analytics, healthcare managers can move away from relying on averages and incorporate the unique characteristics of their patients to create more robust planning decisions, mitigating risks caused by variations in demand.