Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Langheinrich

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

May 04, 2026

Francesco Sovrano, Gabriele Dominici, Marc Langheinrich

Abstract:A key goal of explainable AI (XAI) is to express the decision logic of large language models (LLMs) in symbolic form and link it to internal mechanisms. Global rule-extraction methods typically learn symbolic surrogates without grounding rules in model circuitry, while mechanistic interpretability can connect behaviors to neuron sets but often depends on hand-crafted hypotheses and expensive neuron-level interventions. We introduce MechaRule, a pipeline that grounds rule extraction in LLM circuits by efficiently localizing sparse neurons called agonists, whose activation neutralization disrupts rule-related behaviors. MechaRule rests on two empirical observations. First, within a fixed baseline/flip regime, sparse agonist effects can be approximately monotone and saturating: a few dominant neuron activations can overtop weaker ones at coarse scales, while overlapping neurons flip many of the same examples. This motivates viewing localization as adaptive group testing driven by a regime-conditional strength predicate with confidence-guided conservative pruning, yielding Theta(k log(N/k) + k) interventions over N candidates when k << N neurons are agonists under the monotone-overtopping abstraction. Second, agonists emerge more reliably when ablations are verified through data splits aligned with close-to-faithful rule behavior; spectral splits remain a useful rule-free fallback, while unfaithful splits degrade localization. Empirically, overtopping appears mainly in learned, task-aligned regimes: on arithmetic and jailbreak tasks across Qwen2 and GPT-J, MechaRule recalls 96.8% of high-effect brute-force agonists in completed comparisons, and suppressing localized agonists reduces arithmetic accuracy and jailbreak success by up to 71.1% and 8.8%, respectively.

Via

Access Paper or Ask Questions

In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

Mar 16, 2026

Francesco Sovrano, Lidia Losavio, Giulia Vilone, Marc Langheinrich

Abstract:Symbolic regression aims to replace black-box predictors with concise analytical expressions that can be inspected and validated in scientific machine learning. Kolmogorov-Arnold Networks (KANs) are well suited to this goal because each connection between adjacent units (an "edge") is parametrised by a learnable univariate function that can, in principle, be replaced by a symbolic operator. In practice, however, symbolic extraction is a bottleneck: the standard KAN-to-symbol approach fits operators to each learned edge function in isolation, making the discrete choice sensitive to initialisation and non-convex parameter fitting, and ignoring how local substitutions interact through the full network. We study in-context symbolic regression for operator extraction in KANs, and present two complementary instantiations. Greedy in-context Symbolic Regression (GSR) performs greedy, in-context selection by choosing edge replacements according to end-to-end loss improvement after brief fine-tuning. Gated Matching Pursuit (GMP) amortises this in-context selection by training a differentiable gated operator layer that places an operator library behind sparse gates on each edge; after convergence, gates are discretised (optionally followed by a short in-context greedy refinement pass). We quantify robustness via one-factor-at-a-time (OFAT) hyper-parameter sweeps and assess both predictive error and qualitative consistency of recovered formulas. Across several experiments, greedy in-context symbolic regression achieves up to 99.8% reduction in median OFAT test MSE.

* 24 pages; Accepted for publication at XAI'2026

Via

Access Paper or Ask Questions

ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning

Feb 09, 2026

Dario Fenoglio, Pasquale Polverino, Jacopo Quizi, Martin Gjoreski, Marc Langheinrich

Abstract:Scaling federated learning (FL) to billion-parameter models introduces critical trade-offs between communication efficiency, model accuracy, and privacy guarantees. Existing solutions often tackle these challenges in isolation, sacrificing accuracy or relying on costly cryptographic tools. We propose ERIS, a serverless FL framework that balances privacy and accuracy while eliminating the server bottleneck and distributing the communication load. ERIS combines a model partitioning strategy, distributing aggregation across multiple client-side aggregators, with a distributed shifted gradient compression mechanism. We theoretically prove that ERIS (i) converges at the same rate as FedAvg under standard assumptions, and (ii) bounds mutual information leakage inversely with the number of aggregators, enabling strong privacy guarantees with no accuracy degradation. Experiments across image and text tasks, including large language models, confirm that ERIS achieves FedAvg-level accuracy while substantially reducing communication cost and improving robustness to membership inference and reconstruction attacks, without relying on heavy cryptography or noise injection.

Via

Access Paper or Ask Questions

Federated Learning with Profile Mapping under Distribution Shifts and Drifts

Feb 07, 2026

Mohan Li, Dario Fenoglio, Martin Gjoreski, Marc Langheinrich

Abstract:Federated Learning (FL) enables decentralized model training across clients without sharing raw data, but its performance degrades under real-world data heterogeneity. Existing methods often fail to address distribution shift across clients and distribution drift over time, or they rely on unrealistic assumptions such as known number of client clusters and data heterogeneity types, which limits their generalizability. We introduce Feroma, a novel FL framework that explicitly handles both distribution shift and drift without relying on client or cluster identity. Feroma builds on client distribution profiles-compact, privacy-preserving representations of local data-that guide model aggregation and test-time model assignment through adaptive similarity-based weighting. This design allows Feroma to dynamically select aggregation strategies during training, ranging from clustered to personalized, and deploy suitable models to unseen, and unlabeled test clients without retraining, online adaptation, or prior knowledge on clients' data. Extensive experiments show that compared to 10 state-of-the-art methods, Feroma improves performance and stability under dynamic data heterogeneity conditions-an average accuracy gain of up to 12 percentage points over the best baselines across 6 benchmarks-while maintaining computational and communication overhead comparable to FedAvg. These results highlight that distribution-profile-based aggregation offers a practical path toward robust FL under both data distribution shifts and drifts.

* ICLR2026

Via

Access Paper or Ask Questions

Federated Concept-Based Models: Interpretable models with distributed supervision

Feb 04, 2026

Dario Fenoglio, Arianna Casanova, Francesco De Santis, Mohan Li, Gabriele Dominici, Johannes Schneider, Martin Gjoreski, Marc Langheinrich, Pietro Barbiero, Giovanni De Felice

Abstract:Concept-based models (CMs) enhance interpretability in deep learning by grounding predictions in human-understandable concepts. However, concept annotations are expensive to obtain and rarely available at scale within a single data source. Federated learning (FL) could alleviate this limitation by enabling cross-institutional training that leverages concept annotations distributed across multiple data owners. Yet, FL lacks interpretable modeling paradigms. Integrating CMs with FL is non-trivial: CMs assume a fixed concept space and a predefined model architecture, whereas real-world FL is heterogeneous and non-stationary, with institutions joining over time and bringing new supervision. In this work, we propose Federated Concept-based Models (F-CMs), a new methodology for deploying CMs in evolving FL settings. F-CMs aggregate concept-level information across institutions and efficiently adapt the model architecture in response to changes in the available concept supervision, while preserving institutional privacy. Empirically, F-CMs preserve the accuracy and intervention effectiveness of training settings with full concept supervision, while outperforming non-adaptive federated baselines. Notably, F-CMs enable interpretable inference on concepts not available to a given institution, a key novelty with respect to existing approaches.

Via

Access Paper or Ask Questions

The Power of 10: New Rules for the Digital World

Jan 07, 2026

Sarah Spiekermann-Hoff, Marc Langheinrich, Johannes Hoff, Christiane Wendehorst, Jürgen Pfeffer, Thomas Fuchs, Armin Grunwald

Abstract:As artificial intelligence rapidly advances, society is increasingly captivated by promises of superhuman machines and seamless digital futures. Yet these visions often obscure mounting social, ethical, and psychological concerns tied to pervasive digital technologies - from surveillance to mental health crises. This article argues that a guiding ethos is urgently needed to navigate these transformations. Inspired by the lasting influence of the biblical Ten Commandments, a European interdisciplinary group has proposed "Ten Rules for the Digital World" - a novel ethical framework to help individuals and societies make prudent, human-centered decisions in the age of "supercharged" technology.

* to be published in Communications of the ACM (submitted 26 June 2025, revised 29 August 2025, accepted 3 November 2025)

Via

Access Paper or Ask Questions

On the interplay of Explainability, Privacy and Predictive Performance with Explanation-assisted Model Extraction

May 13, 2025

Fatima Ezzeddine, Rinad Akel, Ihab Sbeity, Silvia Giordano, Marc Langheinrich, Omran Ayoub

Abstract:Machine Learning as a Service (MLaaS) has gained important attraction as a means for deploying powerful predictive models, offering ease of use that enables organizations to leverage advanced analytics without substantial investments in specialized infrastructure or expertise. However, MLaaS platforms must be safeguarded against security and privacy attacks, such as model extraction (MEA) attacks. The increasing integration of explainable AI (XAI) within MLaaS has introduced an additional privacy challenge, as attackers can exploit model explanations particularly counterfactual explanations (CFs) to facilitate MEA. In this paper, we investigate the trade offs among model performance, privacy, and explainability when employing Differential Privacy (DP), a promising technique for mitigating CF facilitated MEA. We evaluate two distinct DP strategies: implemented during the classification model training and at the explainer during CF generation.

Via

Access Paper or Ask Questions

A Survey on Federated Learning in Human Sensing

Jan 07, 2025

Mohan Li, Martin Gjoreski, Pietro Barbiero, Gašper Slapničar, Mitja Luštrek, Nicholas D. Lane, Marc Langheinrich

Abstract:Human Sensing, a field that leverages technology to monitor human activities, psycho-physiological states, and interactions with the environment, enhances our understanding of human behavior and drives the development of advanced services that improve overall quality of life. However, its reliance on detailed and often privacy-sensitive data as the basis for its machine learning (ML) models raises significant legal and ethical concerns. The recently proposed ML approach of Federated Learning (FL) promises to alleviate many of these concerns, as it is able to create accurate ML models without sending raw user data to a central server. While FL has demonstrated its usefulness across a variety of areas, such as text prediction and cyber security, its benefits in Human Sensing are under-explored, given the particular challenges in this domain. This survey conducts a comprehensive analysis of the current state-of-the-art studies on FL in Human Sensing, and proposes a taxonomy and an eight-dimensional assessment for FL approaches. Through the eight-dimensional assessment, we then evaluate whether the surveyed studies consider a specific FL-in-Human-Sensing challenge or not. Finally, based on the overall analysis, we discuss open challenges and highlight five research aspects related to FL in Human Sensing that require urgent research attention. Our work provides a comprehensive corpus of FL studies and aims to assist FL practitioners in developing and evaluating solutions that effectively address the real-world complexities of Human Sensing.

Via

Access Paper or Ask Questions

Evaluating Explanations Through LLMs: Beyond Traditional User Studies

Oct 23, 2024

Francesco Bombassei De Bona, Gabriele Dominici, Tim Miller, Marc Langheinrich, Martin Gjoreski

Figure 1 for Evaluating Explanations Through LLMs: Beyond Traditional User Studies

Figure 2 for Evaluating Explanations Through LLMs: Beyond Traditional User Studies

Figure 3 for Evaluating Explanations Through LLMs: Beyond Traditional User Studies

Figure 4 for Evaluating Explanations Through LLMs: Beyond Traditional User Studies

Abstract:As AI becomes fundamental in sectors like healthcare, explainable AI (XAI) tools are essential for trust and transparency. However, traditional user studies used to evaluate these tools are often costly, time consuming, and difficult to scale. In this paper, we explore the use of Large Language Models (LLMs) to replicate human participants to help streamline XAI evaluation. We reproduce a user study comparing counterfactual and causal explanations, replicating human participants with seven LLMs under various settings. Our results show that (i) LLMs can replicate most conclusions from the original study, (ii) different LLMs yield varying levels of alignment in the results, and (iii) experimental factors such as LLM memory and output variability affect alignment with human responses. These initial findings suggest that LLMs could provide a scalable and cost-effective way to simplify qualitative XAI evaluation.

Via

Access Paper or Ask Questions

Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

May 28, 2024

Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga, Alberto Termine, Martin Gjoreski, Giuseppe Marra, Marc Langheinrich

Figure 1 for Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

Figure 2 for Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

Figure 3 for Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

Figure 4 for Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning

Abstract:Causal opacity denotes the difficulty in understanding the "hidden" causal structure underlying a deep neural network's (DNN) reasoning. This leads to the inability to rely on and verify state-of-the-art DNN-based systems especially in high-stakes scenarios. For this reason, causal opacity represents a key open challenge at the intersection of deep learning, interpretability, and causality. This work addresses this gap by introducing Causal Concept Embedding Models (Causal CEMs), a class of interpretable models whose decision-making process is causally transparent by design. The results of our experiments show that Causal CEMs can: (i) match the generalization performance of causally-opaque models, (ii) support the analysis of interventional and counterfactual scenarios, thereby improving the model's causal interpretability and supporting the effective verification of its reliability and fairness, and (iii) enable human-in-the-loop corrections to mispredicted intermediate reasoning steps, boosting not just downstream accuracy after corrections but also accuracy of the explanation provided for a specific instance.

Via

Access Paper or Ask Questions