Abstract:Graph Neural Networks (GNNs) have emerged as an efficient alternative to convolutional approaches for vision tasks such as image classification, leveraging patch-based representations instead of raw pixels. These methods construct graphs where image patches serve as nodes, and edges are established based on patch similarity or classification relevance. Despite their efficiency, the explainability of GNN-based vision models remains underexplored, even though graphs are naturally interpretable. In this work, we analyze the semantic consistency of the graphs formed at different layers of GNN-based image classifiers, focusing on how well they preserve object structures and meaningful relationships. A comprehensive analysis is presented by quantifying the extent to which inter-layer graph connections reflect semantic similarity and spatial coherence. Explanations from standard and adversarial settings are also compared to assess whether they reflect the classifiers' robustness. Additionally, we visualize the flow of information across layers through heatmap-based visualization techniques, thereby highlighting the models' explainability. Our findings demonstrate that the decision-making processes of these models can be effectively explained, while also revealing that their reasoning does not necessarily align with human perception, especially in deeper layers.
Abstract:Counterfactuals have been established as a popular explainability technique which leverages a set of minimal edits to alter the prediction of a classifier. When considering conceptual counterfactuals, the edits requested should correspond to salient concepts present in the input data. At the same time, conceptual distances are defined by knowledge graphs, ensuring the optimality of conceptual edits. In this work, we extend previous endeavors on conceptual counterfactuals by introducing \textit{graph edits as counterfactual explanations}: should we represent input data as graphs, which is the shortest graph edit path that results in an alternative classification label as provided by a black-box classifier?