Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Laines Schmalwasser

On the Faithfulness of Post-Hoc Concept Bottleneck Models

Jun 29, 2026

Laines Schmalwasser, Jan Blunk, Niklas Penzel, Julia Niebling, Joachim Denzler

Abstract:Human decision-making interprets the world through high-level concepts, such as recognizing a bird by its belly color. To bridge the gap between opaque deep learning representations and human understanding, Post-Hoc Concept Bottleneck Models (post-hoc CBMs) project latent features onto interpretable concept spaces using auxiliary datasets or vision-language models. However, relying on target task accuracy as the primary measure of post-hoc CBM success obscures whether the learned concepts are semantically meaningful or merely predictive artifacts. For example, random concept projections can achieve competitive accuracy despite being semantically meaningless. In this work, we analyze the learned projections directly and identify two failure cases: First, for concept projections learned from auxiliary data, covariate shifts can lead to unfaithful concept representations for the target task. In particular, we provide an upper bound on the error introduced by this shift. Second, systematic label noise in surrogate concept labels generated by vision-language models leads to unfaithful projections. After formalizing these failure modes, we introduce novel metrics that decouple concept faithfulness from predictive accuracy. Our empirical results across real-world and synthetic benchmarks confirm that these metrics identify unfaithful behaviors that standard accuracy-based evaluation fails to detect.

* Accepted at ECCV 2026, 41 pages, 13 figures, 2 tables

Via

Access Paper or Ask Questions

FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks

May 23, 2025

Laines Schmalwasser, Niklas Penzel, Joachim Denzler, Julia Niebling

Abstract:Concepts such as objects, patterns, and shapes are how humans understand the world. Building on this intuition, concept-based explainability methods aim to study representations learned by deep neural networks in relation to human-understandable concepts. Here, Concept Activation Vectors (CAVs) are an important tool and can identify whether a model learned a concept or not. However, the computational cost and time requirements of existing CAV computation pose a significant challenge, particularly in large-scale, high-dimensional architectures. To address this limitation, we introduce FastCAV, a novel approach that accelerates the extraction of CAVs by up to 63.6x (on average 46.4x). We provide a theoretical foundation for our approach and give concrete assumptions under which it is equivalent to established SVM-based methods. Our empirical results demonstrate that CAVs calculated with FastCAV maintain similar performance while being more efficient and stable. In downstream applications, i.e., concept-based explanation methods, we show that FastCAV can act as a replacement leading to equivalent insights. Hence, our approach enables previously infeasible investigations of deep models, which we demonstrate by tracking the evolution of concepts during model training.

* Accepted at ICML 2025, 27 pages, 20 figures, 9 tables

Via

Access Paper or Ask Questions

Exploiting Text-Image Latent Spaces for the Description of Visual Concepts

Oct 23, 2024

Laines Schmalwasser, Jakob Gawlikowski, Joachim Denzler, Julia Niebling

Figure 1 for Exploiting Text-Image Latent Spaces for the Description of Visual Concepts

Figure 2 for Exploiting Text-Image Latent Spaces for the Description of Visual Concepts

Figure 3 for Exploiting Text-Image Latent Spaces for the Description of Visual Concepts

Figure 4 for Exploiting Text-Image Latent Spaces for the Description of Visual Concepts

Abstract:Concept Activation Vectors (CAVs) offer insights into neural network decision-making by linking human friendly concepts to the model's internal feature extraction process. However, when a new set of CAVs is discovered, they must still be translated into a human understandable description. For image-based neural networks, this is typically done by visualizing the most relevant images of a CAV, while the determination of the concept is left to humans. In this work, we introduce an approach to aid the interpretation of newly discovered concept sets by suggesting textual descriptions for each CAV. This is done by mapping the most relevant images representing a CAV into a text-image embedding where a joint description of these relevant images can be computed. We propose utilizing the most relevant receptive fields instead of full images encoded. We demonstrate the capabilities of this approach in multiple experiments with and without given CAV labels, showing that the proposed approach provides accurate descriptions for the CAVs and reduces the challenge of concept interpretation.

* 19 pages, 7 figures, to be published in ICPR

Via

Access Paper or Ask Questions