Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Lapuschkin

A Fresh Look at Sanity Checks for Saliency Maps

May 03, 2024

Anna Hedström, Leander Weber, Sebastian Lapuschkin, Marina Höhne

Abstract:The Model Parameter Randomisation Test (MPRT) is highly recognised in the eXplainable Artificial Intelligence (XAI) community due to its fundamental evaluative criterion: explanations should be sensitive to the parameters of the model they seek to explain. However, recent studies have raised several methodological concerns for the empirical interpretation of MPRT. In response, we propose two modifications to the original test: Smooth MPRT and Efficient MPRT. The former reduces the impact of noise on evaluation outcomes via sampling, while the latter avoids the need for biased similarity measurements by re-interpreting the test through the increase in explanation complexity after full model randomisation. Our experiments show that these modifications enhance the metric reliability, facilitating a more trustworthy deployment of explanation methods.

* arXiv admin note: text overlap with arXiv:2401.06465

Via

Access Paper or Ask Questions

Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

Apr 16, 2024

Christian Tinauer, Anna Damulina, Maximilian Sackl, Martin Soellradl, Reduan Achtibat, Maximilian Dreyer, Frederik Pahde, Sebastian Lapuschkin, Reinhold Schmidt, Stefan Ropele(+2 more)

Figure 1 for Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

Figure 2 for Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

Figure 3 for Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

Figure 4 for Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification

Abstract:Motivation. While recent studies show high accuracy in the classification of Alzheimer's disease using deep neural networks, the underlying learned concepts have not been investigated. Goals. To systematically identify changes in brain regions through concepts learned by the deep neural network for model validation. Approach. Using quantitative R2* maps we separated Alzheimer's patients (n=117) from normal controls (n=219) by using a convolutional neural network and systematically investigated the learned concepts using Concept Relevance Propagation and compared these results to a conventional region of interest-based analysis. Results. In line with established histological findings and the region of interest-based analyses, highly relevant concepts were primarily found in and adjacent to the basal ganglia. Impact. The identification of concepts learned by deep neural networks for disease classification enables validation of the models and could potentially improve reliability.

Via

Access Paper or Ask Questions

Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression

Apr 15, 2024

Dilyara Bareeva, Maximilian Dreyer, Frederik Pahde, Wojciech Samek, Sebastian Lapuschkin

Figure 1 for Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression

Figure 2 for Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression

Figure 3 for Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression

Figure 4 for Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias Suppression

Abstract:Deep Neural Networks are prone to learning and relying on spurious correlations in the training data, which, for high-risk applications, can have fatal consequences. Various approaches to suppress model reliance on harmful features have been proposed that can be applied post-hoc without additional training. Whereas those methods can be applied with efficiency, they also tend to harm model performance by globally shifting the distribution of latent features. To mitigate unintended overcorrection of model behavior, we propose a reactive approach conditioned on model-derived knowledge and eXplainable Artificial Intelligence (XAI) insights. While the reactive approach can be applied to many post-hoc methods, we demonstrate the incorporation of reactivity in particular for P-ClArC (Projective Class Artifact Compensation), introducing a new method called R-ClArC (Reactive Class Artifact Compensation). Through rigorous experiments in controlled settings (FunnyBirds) and with a real-world dataset (ISIC2019), we show that introducing reactivity can minimize the detrimental effect of the applied correction while simultaneously ensuring low reliance on spurious features.

Via

Access Paper or Ask Questions

PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Apr 09, 2024

Maximilian Dreyer, Erblina Purelku, Johanna Vielhaben, Wojciech Samek, Sebastian Lapuschkin

Figure 1 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Figure 2 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Figure 3 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Figure 4 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Abstract:The field of mechanistic interpretability aims to study the role of individual neurons in Deep Neural Networks. Single neurons, however, have the capability to act polysemantically and encode for multiple (unrelated) features, which renders their interpretation difficult. We present a method for disentangling polysemanticity of any Deep Neural Network by decomposing a polysemantic neuron into multiple monosemantic "virtual" neurons. This is achieved by identifying the relevant sub-graph ("circuit") for each "pure" feature. We demonstrate how our approach allows us to find and disentangle various polysemantic units of ResNet models trained on ImageNet. While evaluating feature visualizations using CLIP, our method effectively disentangles representations, improving upon methods based on neuron activations. Our code is available at https://github.com/maxdreyer/PURE.

* 14 pages (4 pages manuscript, 2 pages references, 8 pages appendix)

Via

Access Paper or Ask Questions

DualView: Data Attribution from the Dual Perspective

Feb 19, 2024

Galip Ümit Yolcu, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

Abstract:Local data attribution (or influence estimation) techniques aim at estimating the impact that individual data points seen during training have on particular predictions of an already trained Machine Learning model during test time. Previous methods either do not perform well consistently across different evaluation criteria from literature, are characterized by a high computational demand, or suffer from both. In this work we present DualView, a novel method for post-hoc data attribution based on surrogate modelling, demonstrating both high computational efficiency, as well as good evaluation results. With a focus on neural networks, we evaluate our proposed technique using suitable quantitative evaluation strategies from the literature against related principal local data attribution methods. We find that DualView requires considerably lower computational resources than other methods, while demonstrating comparable performance to competing approaches across evaluation metrics. Futhermore, our proposed method produces sparse explanations, where sparseness can be tuned via a hyperparameter. Finally, we showcase that with DualView, we can now render explanations from local data attributions compatible with established local feature attribution methods: For each prediction on (test) data points explained in terms of impactful samples from the training set, we are able to compute and visualize how the prediction on (test) sample relates to each influential training sample in terms of features recognized and by the model. We provide an Open Source implementation of DualView online, together with implementations for all other local data attribution methods we compare against, as well as the metrics reported here, for full reproducibility.

Via

Access Paper or Ask Questions

AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Feb 08, 2024

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek

Figure 1 for AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Figure 2 for AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Figure 3 for AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Figure 4 for AttnLRP: Attention-Aware Layer-wise Relevance Propagation for Transformers

Abstract:Large Language Models are prone to biased predictions and hallucinations, underlining the paramount importance of understanding their model-internal reasoning process. However, achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge. By extending the Layer-wise Relevance Propagation attribution method to handle attention layers, we address these challenges effectively. While partial solutions exist, our method is the first to faithfully and holistically attribute not only input but also latent representations of transformer models with the computational efficiency similar to a singular backward pass. Through extensive evaluations against existing methods on Llama 2, Flan-T5 and the Vision Transformer architecture, we demonstrate that our proposed approach surpasses alternative methods in terms of faithfulness and enables the understanding of latent representations, opening up the door for concept-based explanations. We provide an open-source implementation on GitHub https://github.com/rachtibat/LRP-for-Transformers.

Via

Access Paper or Ask Questions

Explaining Predictive Uncertainty by Exposing Second-Order Effects

Jan 30, 2024

Florian Bley, Sebastian Lapuschkin, Wojciech Samek, Grégoire Montavon

Abstract:Explainable AI has brought transparency into complex ML blackboxes, enabling, in particular, to identify which features these models use for their predictions. So far, the question of explaining predictive uncertainty, i.e. why a model 'doubts', has been scarcely studied. Our investigation reveals that predictive uncertainty is dominated by second-order effects, involving single features or product interactions between them. We contribute a new method for explaining predictive uncertainty based on these second-order effects. Computationally, our method reduces to a simple covariance computation over a collection of first-order explanations. Our method is generally applicable, allowing for turning common attribution techniques (LRP, Gradient x Input, etc.) into powerful second-order uncertainty explainers, which we call CovLRP, CovGI, etc. The accuracy of the explanations our method produces is demonstrated through systematic quantitative evaluations, and the overall usefulness of our method is demonstrated via two practical showcases.

* 12 pages + supplement

Via

Access Paper or Ask Questions

Sanity Checks Revisited: An Exploration to Repair the Model Parameter Randomisation Test

Jan 12, 2024

Anna Hedström, Leander Weber, Sebastian Lapuschkin, Marina MC Höhne

Abstract:The Model Parameter Randomisation Test (MPRT) is widely acknowledged in the eXplainable Artificial Intelligence (XAI) community for its well-motivated evaluative principle: that the explanation function should be sensitive to changes in the parameters of the model function. However, recent works have identified several methodological caveats for the empirical interpretation of MPRT. To address these caveats, we introduce two adaptations to the original MPRT -- Smooth MPRT and Efficient MPRT, where the former minimises the impact that noise has on the evaluation results through sampling and the latter circumvents the need for biased similarity measurements by re-interpreting the test through the explanation's rise in complexity, after full parameter randomisation. Our experimental results demonstrate that these proposed variants lead to improved metric reliability, thus enabling a more trustworthy application of XAI methods.

* 19 pages, 12 figures, NeurIPS XAIA 2023

Via

Access Paper or Ask Questions

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Nov 28, 2023

Maximilian Dreyer, Reduan Achtibat, Wojciech Samek, Sebastian Lapuschkin

Figure 1 for Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Figure 2 for Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Figure 3 for Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Figure 4 for Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Abstract:Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.

* 37 pages (9 pages manuscript, 2 pages references, 26 pages appendix)

Via

Access Paper or Ask Questions

Generative Fractional Diffusion Models

Oct 26, 2023

Gabriel Nobis, Marco Aversa, Maximilian Springenberg, Michael Detzel, Stefano Ermon, Shinichi Nakajima, Roderick Murray-Smith, Sebastian Lapuschkin, Christoph Knochenhauer, Luis Oala(+1 more)

Figure 1 for Generative Fractional Diffusion Models

Figure 2 for Generative Fractional Diffusion Models

Figure 3 for Generative Fractional Diffusion Models

Figure 4 for Generative Fractional Diffusion Models

Abstract:We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index $H\in(0,1)$ of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.

Via

Access Paper or Ask Questions