Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Carsten Maple

DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants

Jan 17, 2026

Abhishek Kumar, Riya Tapwal, Carsten Maple

Abstract:Large Language Models (LLMs) are increasingly integrated into vehicle-based digital assistants, where unsafe, ambiguous, or legally incorrect responses can lead to serious safety, ethical, and regulatory consequences. Despite growing interest in LLM safety, existing taxonomies and evaluation frameworks remain largely general-purpose and fail to capture the domain-specific risks inherent to real-world driving scenarios. In this paper, we introduce DriveSafe, a hierarchical, four-level risk taxonomy designed to systematically characterize safety-critical failure modes of LLM-based driving assistants. The taxonomy comprises 129 fine-grained atomic risk categories spanning technical, legal, societal, and ethical dimensions, grounded in real-world driving regulations and safety principles and reviewed by domain experts. To validate the safety relevance and realism of the constructed prompts, we evaluate their refusal behavior across six widely deployed LLMs. Our analysis shows that the evaluated models often fail to appropriately refuse unsafe or non-compliant driving-related queries, underscoring the limitations of general-purpose safety alignment in driving contexts.

Via

Access Paper or Ask Questions

Private Federated Multiclass Post-hoc Calibration

Oct 02, 2025

Samuel Maddock, Graham Cormode, Carsten Maple

Abstract:Calibrating machine learning models so that predicted probabilities better reflect the true outcome frequencies is crucial for reliable decision-making across many applications. In Federated Learning (FL), the goal is to train a global model on data which is distributed across multiple clients and cannot be centralized due to privacy concerns. FL is applied in key areas such as healthcare and finance where calibration is strongly required, yet federated private calibration has been largely overlooked. This work introduces the integration of post-hoc model calibration techniques within FL. Specifically, we transfer traditional centralized calibration methods such as histogram binning and temperature scaling into federated environments and define new methods to operate them under strong client heterogeneity. We study (1) a federated setting and (2) a user-level Differential Privacy (DP) setting and demonstrate how both federation and DP impacts calibration accuracy. We propose strategies to mitigate degradation commonly observed under heterogeneity and our findings highlight that our federated temperature scaling works best for DP-FL whereas our weighted binning approach is best when DP is not required.

Via

Access Paper or Ask Questions

Individualised Counterfactual Examples Using Conformal Prediction Intervals

May 28, 2025

James M. Adams, Gesine Reinert, Lukasz Szpruch, Carsten Maple, Andrew Elliott

Figure 1 for Individualised Counterfactual Examples Using Conformal Prediction Intervals

Figure 2 for Individualised Counterfactual Examples Using Conformal Prediction Intervals

Figure 3 for Individualised Counterfactual Examples Using Conformal Prediction Intervals

Figure 4 for Individualised Counterfactual Examples Using Conformal Prediction Intervals

Abstract:Counterfactual explanations for black-box models aim to pr ovide insight into an algorithmic decision to its recipient. For a binary classification problem an individual counterfactual details which features might be changed for the model to infer the opposite class. High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision, and so it is important to identify additional criteria to select the most useful counterfactuals. In this paper, we explore the idea that the counterfactuals should be maximally informative when considering the knowledge of a specific individual about the underlying classifier. To quantify this information gain we explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval. Regions of feature space where the prediction interval is wide correspond to areas where the confidence in decision making is low, and an additional counterfactual example might be more informative to an individual. To explore and evaluate our individualised conformal prediction interval counterfactuals (CPICFs), first we present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary, conformal intervals via three different methods, and resultant CPICFs. Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query. Finally, in both our synthetic data set and a complex real world dataset with a combination of continuous and discrete variables, we measure the utility of these counterfactuals via data augmentation, testing the performance on a held out set.

* Submitted to Conformal and Probabilistic Predictions With Applications (COPA) 2025

Via

Access Paper or Ask Questions

Justified Evidence Collection for Argument-based AI Fairness Assurance

May 12, 2025

Alpay Sabuncuoglu, Christopher Burr, Carsten Maple

Abstract:It is well recognised that ensuring fair AI systems is a complex sociotechnical challenge, which requires careful deliberation and continuous oversight across all stages of a system's lifecycle, from defining requirements to model deployment and deprovisioning. Dynamic argument-based assurance cases, which present structured arguments supported by evidence, have emerged as a systematic approach to evaluating and mitigating safety risks and hazards in AI-enabled system development and have also been extended to deal with broader normative goals such as fairness and explainability. This paper introduces a systems-engineering-driven framework, supported by software tooling, to operationalise a dynamic approach to argument-based assurance in two stages. In the first stage, during the requirements planning phase, a multi-disciplinary and multi-stakeholder team define goals and claims to be established (and evidenced) by conducting a comprehensive fairness governance process. In the second stage, a continuous monitoring interface gathers evidence from existing artefacts (e.g. metrics from automated tests), such as model, data, and use case documentation, to support these arguments dynamically. The framework's effectiveness is demonstrated through an illustrative case study in finance, with a focus on supporting fairness-related arguments.

* The paper is accepted for ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT '25)

Via

Access Paper or Ask Questions

Representation Engineering for Large-Language Models: Survey and Research Challenges

Feb 24, 2025

Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia Yang, David Williams-King, Linh Le, Kosi Asuzu, Carsten Maple

Abstract:Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.

Via

Access Paper or Ask Questions

Distributed, communication-efficient, and differentially private estimation of KL divergence

Nov 25, 2024

Mary Scott, Sayan Biswas, Graham Cormode, Carsten Maple

Figure 1 for Distributed, communication-efficient, and differentially private estimation of KL divergence

Figure 2 for Distributed, communication-efficient, and differentially private estimation of KL divergence

Figure 3 for Distributed, communication-efficient, and differentially private estimation of KL divergence

Figure 4 for Distributed, communication-efficient, and differentially private estimation of KL divergence

Abstract:A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy. We analyze their theoretical properties and present an empirical study of their performance. We explore parameter settings that optimize the accuracy of the algorithm catering to each of the settings; these provide sub-variations that are applicable to real-world tasks, addressing different context- and application-specific trust level requirements. Our experimental results confirm that our private estimators achieve accuracy comparable to a baseline algorithm without differential privacy guarantees.

* 28 pages, 5 figures

Via

Access Paper or Ask Questions

Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Nov 07, 2024

Mary Scott, Graham Cormode, Carsten Maple

Figure 1 for Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Figure 2 for Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Figure 3 for Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Figure 4 for Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity

Abstract:Statistical heterogeneity is a measure of how skewed the samples of a dataset are. It is a common problem in the study of differential privacy that the usage of a statistically heterogeneous dataset results in a significant loss of accuracy. In federated scenarios, statistical heterogeneity is more likely to happen, and so the above problem is even more pressing. We explore the three most promising ways to measure statistical heterogeneity and give formulae for their accuracy, while simultaneously incorporating differential privacy. We find the optimum privacy parameters via an analytic mechanism, which incorporates root finding methods. We validate the main theorems and related hypotheses experimentally, and test the robustness of the analytic mechanism to different heterogeneity levels. The analytic mechanism in a distributed setting delivers superior accuracy to all combinations involving the classic mechanism and/or the centralized setting. All measures of statistical heterogeneity do not lose significant accuracy when a heterogeneous sample is used.

* 26 pages, 6 tables, 1 figure

Via

Access Paper or Ask Questions

AI security and cyber risk in IoT systems

Oct 11, 2024

Petar Radanliev, David De Roure, Carsten Maple, Jason R. C. Nurse, Razvan Nicolescu, Uchenna Ani

Figure 1 for AI security and cyber risk in IoT systems

Figure 2 for AI security and cyber risk in IoT systems

Figure 3 for AI security and cyber risk in IoT systems

Figure 4 for AI security and cyber risk in IoT systems

Abstract:We present a dependency model tailored to the context of current challenges in data strategies and make recommendations for the cybersecurity community. The model can be used for cyber risk estimation and assessment and generic risk impact assessment.

Via

Access Paper or Ask Questions

A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

Jul 09, 2024

Lu Zhang, Nabil Moukafih, Hamad Alamri, Gregory Epiphaniou, Carsten Maple

Figure 1 for A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

Figure 2 for A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

Figure 3 for A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

Figure 4 for A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

Abstract:Since its implementation in May 2018, the General Data Protection Regulation (GDPR) has prompted businesses to revisit and revise their data handling practices to ensure compliance. The privacy policy, which serves as the primary means of informing users about their privacy rights and the data practices of companies, has been significantly updated by numerous businesses post-GDPR implementation. However, many privacy policies remain packed with technical jargon, lengthy explanations, and vague descriptions of data practices and user rights. This makes it a challenging task for users and regulatory authorities to manually verify the GDPR compliance of these privacy policies. In this study, we aim to address the challenge of compliance analysis between GDPR (Article 13) and privacy policies for 5G networks. We manually collected privacy policies from almost 70 different 5G MNOs, and we utilized an automated BERT-based model for classification. We show that an encouraging 51$\%$ of companies demonstrate a strong adherence to GDPR. In addition, we present the first study that provides current empirical evidence on the readability of privacy policies for 5G network. we adopted readability analysis toolset that incorporates various established readability metrics. The findings empirically show that the readability of the majority of current privacy policies remains a significant challenge. Hence, 5G providers need to invest considerable effort into revising these documents to enhance both their utility and the overall user experience.

* Published in IEEE Conference on Communications and Network Security (CNS), 2023

Via

Access Paper or Ask Questions

Representation noising effectively prevents harmful fine-tuning on LLMs

May 23, 2024

Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, David Atanasov, Robie Gonzales, Subhabrata Majumdar, Carsten Maple, Hassan Sajjad, Frank Rudzicz

Figure 1 for Representation noising effectively prevents harmful fine-tuning on LLMs

Figure 2 for Representation noising effectively prevents harmful fine-tuning on LLMs

Figure 3 for Representation noising effectively prevents harmful fine-tuning on LLMs

Figure 4 for Representation noising effectively prevents harmful fine-tuning on LLMs

Abstract:Releasing open-source large language models (LLMs) presents a dual-use risk since bad actors can easily fine-tune these models for harmful purposes. Even without the open release of weights, weight stealing and fine-tuning APIs make closed models vulnerable to harmful fine-tuning attacks (HFAs). While safety measures like preventing jailbreaks and improving safety guardrails are important, such measures can easily be reversed through fine-tuning. In this work, we propose Representation Noising (RepNoise), a defence mechanism that is effective even when attackers have access to the weights and the defender no longer has any control. RepNoise works by removing information about harmful representations such that it is difficult to recover them during fine-tuning. Importantly, our defence is also able to generalize across different subsets of harm that have not been seen during the defence process. Our method does not degrade the general capability of LLMs and retains the ability to train the model on harmless tasks. We provide empirical evidence that the effectiveness of our defence lies in its "depth": the degree to which information about harmful representations is removed across all layers of the LLM.

Via

Access Paper or Ask Questions