Alert button
Picture for Diego Marcos

Diego Marcos

Alert button

PDiscoNet: Semantically consistent part discovery for fine-grained recognition

Sep 06, 2023
Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cassio F. Dantas, Dino Ienco, Zeynep Akata, Diego Marcos

Fine-grained classification often requires recognizing specific object parts, such as beak shape and wing patterns for birds. Encouraging a fine-grained classification model to first detect such parts and then using them to infer the class could help us gauge whether the model is indeed looking at the right details better than with interpretability methods that provide a single attribution map. We propose PDiscoNet to discover object parts by using only image-level class labels along with priors encouraging the parts to be: discriminative, compact, distinct from each other, equivariant to rigid transforms, and active in at least some of the images. In addition to using the appropriate losses to encode these priors, we propose to use part-dropout, where full part feature vectors are dropped at once to prevent a single part from dominating in the classification, and part feature vector modulation, which makes the information coming from each part distinct from the perspective of the classifier. Our results on CUB, CelebA, and PartImageNet show that the proposed method provides substantially better part discovery performance than previous methods while not requiring any additional hyper-parameter tuning and without penalizing the classification performance. The code is available at https://github.com/robertdvdk/part_detection.

* 9 pages, 8 figures, ICCV 
Viaarxiv icon

Time Series Analysis of Urban Liveability

Sep 01, 2023
Alex Levering, Diego Marcos, Devis Tuia

Figure 1 for Time Series Analysis of Urban Liveability
Figure 2 for Time Series Analysis of Urban Liveability

In this paper we explore deep learning models to monitor longitudinal liveability changes in Dutch cities at the neighbourhood level. Our liveability reference data is defined by a country-wise yearly survey based on a set of indicators combined into a liveability score, the Leefbaarometer. We pair this reference data with yearly-available high-resolution aerial images, which creates yearly timesteps at which liveability can be monitored. We deploy a convolutional neural network trained on an aerial image from 2016 and the Leefbaarometer score to predict liveability at new timesteps 2012 and 2020. The results in a city used for training (Amsterdam) and one never seen during training (Eindhoven) show some trends which are difficult to interpret, especially in light of the differences in image acquisitions at the different time steps. This demonstrates the complexity of liveability monitoring across time periods and the necessity for more sophisticated methods compensating for changes unrelated to liveability dynamics.

* 2023 Joint Urban Remote Sensing Event (JURSE), Heraklion, Greece, 2023, pp. 1-4  
* Accepted at JURSE 2023 
Viaarxiv icon

Masking Strategies for Background Bias Removal in Computer Vision Models

Aug 23, 2023
Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Diego Marcos

Models for fine-grained image classification tasks, where the difference between some classes can be extremely subtle and the number of samples per class tends to be low, are particularly prone to picking up background-related biases and demand robust methods to handle potential examples with out-of-distribution (OOD) backgrounds. To gain deeper insights into this critical problem, our research investigates the impact of background-induced bias on fine-grained image classification, evaluating standard backbone models such as Convolutional Neural Network (CNN) and Vision Transformers (ViT). We explore two masking strategies to mitigate background-induced bias: Early masking, which removes background information at the (input) image level, and late masking, which selectively masks high-level spatial features corresponding to the background. Extensive experiments assess the behavior of CNN and ViT models under different masking strategies, with a focus on their generalization to OOD backgrounds. The obtained findings demonstrate that both proposed strategies enhance OOD performance compared to the baseline models, with early masking consistently exhibiting the best OOD performance. Notably, a ViT variant employing GAP-Pooled Patch token-based classification combined with early masking achieves the highest OOD robustness.

* Accepted at the 2023 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) on Out Of Distribution Generalization in Computer Vision (OOD-CV) 
Viaarxiv icon

Sparse Linear Concept Discovery Models

Aug 21, 2023
Konstantinos P. Panousis, Dino Ienco, Diego Marcos

Figure 1 for Sparse Linear Concept Discovery Models
Figure 2 for Sparse Linear Concept Discovery Models
Figure 3 for Sparse Linear Concept Discovery Models
Figure 4 for Sparse Linear Concept Discovery Models

The recent mass adoption of DNNs, even in safety-critical scenarios, has shifted the focus of the research community towards the creation of inherently intrepretable models. Concept Bottleneck Models (CBMs) constitute a popular approach where hidden layers are tied to human understandable concepts allowing for investigation and correction of the network's decisions. However, CBMs usually suffer from: (i) performance degradation and (ii) lower interpretability than intended due to the sheer amount of concepts contributing to each decision. In this work, we propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. In stark contrast to related approaches, the sparsity in our framework is achieved via principled Bayesian arguments by inferring concept presence via a data-driven Bernoulli distribution. As we experimentally show, our framework not only outperforms recent CBM approaches accuracy-wise, but it also yields high per example concept sparsity, facilitating the individual investigation of the emerging concepts.

* Accepted @ ICCVW CLVL 2023 
Viaarxiv icon

Counterfactual Explanations for Land Cover Mapping in a Multi-class Setting

Jan 04, 2023
Cassio F. Dantas, Diego Marcos, Dino Ienco

Figure 1 for Counterfactual Explanations for Land Cover Mapping in a Multi-class Setting
Figure 2 for Counterfactual Explanations for Land Cover Mapping in a Multi-class Setting
Figure 3 for Counterfactual Explanations for Land Cover Mapping in a Multi-class Setting
Figure 4 for Counterfactual Explanations for Land Cover Mapping in a Multi-class Setting

Counterfactual explanations are an emerging tool to enhance interpretability of deep learning models. Given a sample, these methods seek to find and display to the user similar samples across the decision boundary. In this paper, we propose a generative adversarial counterfactual approach for satellite image time series in a multi-class setting for the land cover classification task. One of the distinctive features of the proposed approach is the lack of prior assumption on the targeted class for a given counterfactual explanation. This inherent flexibility allows for the discovery of interesting information on the relationship between land cover classes. The other feature consists of encouraging the counterfactual to differ from the original sample only in a small and compact temporal segment. These time-contiguous perturbations allow for a much sparser and, thus, interpretable solution. Furthermore, plausibility/realism of the generated counterfactual explanations is enforced via the proposed adversarial learning strategy.

Viaarxiv icon

Abstracting Sketches through Simple Primitives

Jul 27, 2022
Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, Zeynep Akata

Figure 1 for Abstracting Sketches through Simple Primitives
Figure 2 for Abstracting Sketches through Simple Primitives
Figure 3 for Abstracting Sketches through Simple Primitives
Figure 4 for Abstracting Sketches through Simple Primitives

Humans show high-level of abstraction capabilities in games that require quickly communicating object information. They decompose the message content into multiple parts and communicate them in an interpretable protocol. Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget. To solve this task, our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner. Specifically, PMN maps each stroke of a sketch to its most similar primitive in a given set, predicting an affine transformation that aligns the selected primitive to the target stroke. We learn this stroke-to-primitive mapping end-to-end with a distance-transform loss that is minimal when the original sketch is precisely reconstructed with the predicted primitives. Our PMN abstraction empirically achieves the highest performance on sketch recognition and sketch-based image retrieval given a communication budget, while at the same time being highly interpretable. This opens up new possibilities for sketch analysis, such as comparing sketches by extracting the most relevant primitives that define an object category. Code is available at https://github.com/ExplainableML/sketch-primitives.

* European Conference on Computer Vision (ECCV) 2022 
Viaarxiv icon

A weakly supervised framework for high-resolution crop yield forecasts

May 18, 2022
Dilli R. Paudel, Diego Marcos, Allard de Wit, Hendrik Boogaard, Ioannis N. Athanasiadis

Figure 1 for A weakly supervised framework for high-resolution crop yield forecasts
Figure 2 for A weakly supervised framework for high-resolution crop yield forecasts
Figure 3 for A weakly supervised framework for high-resolution crop yield forecasts
Figure 4 for A weakly supervised framework for high-resolution crop yield forecasts

Predictor inputs and label data for crop yield forecasting are not always available at the same spatial resolution. We propose a deep learning framework that uses high resolution inputs and low resolution labels to produce crop yield forecasts for both spatial levels. The forecasting model is calibrated by weak supervision from low resolution crop area and yield statistics. We evaluated the framework by disaggregating regional yields in Europe from parent statistical regions to sub-regions for five countries (Germany, Spain, France, Hungary, Italy) and two crops (soft wheat and potatoes). Performance of weakly supervised models was compared with linear trend models and Gradient-Boosted Decision Trees (GBDT). Higher resolution crop yield forecasts are useful to policymakers and other stakeholders. Weakly supervised deep learning methods provide a way to produce such forecasts even in the absence of high resolution yield data.

* Appeared at the AI for Earth Sciences workshop at @ICLR2022, April 29, 2022. https://ai4earthscience.github.io/iclr-2022-workshop/ 
Viaarxiv icon

Self-supervised pre-training enhances change detection in Sentinel-2 imagery

Jan 20, 2021
Marrit Leenstra, Diego Marcos, Francesca Bovolo, Devis Tuia

Figure 1 for Self-supervised pre-training enhances change detection in Sentinel-2 imagery
Figure 2 for Self-supervised pre-training enhances change detection in Sentinel-2 imagery
Figure 3 for Self-supervised pre-training enhances change detection in Sentinel-2 imagery
Figure 4 for Self-supervised pre-training enhances change detection in Sentinel-2 imagery

While annotated images for change detection using satellite imagery are scarce and costly to obtain, there is a wealth of unlabeled images being generated every day. In order to leverage these data to learn an image representation more adequate for change detection, we explore methods that exploit the temporal consistency of Sentinel-2 times series to obtain a usable self-supervised learning signal. For this, we build and make publicly available (https://zenodo.org/record/4280482) the Sentinel-2 Multitemporal Cities Pairs (S2MTCP) dataset, containing multitemporal image pairs from 1520 urban areas worldwide. We test the results of multiple self-supervised learning methods for pre-training models for change detection and apply it on a public change detection dataset made of Sentinel-2 image pairs (OSCD).

Viaarxiv icon

Semantic Segmentation of Remote Sensing Images with Sparse Annotations

Jan 10, 2021
Yuansheng Hua, Diego Marcos, Lichao Mou, Xiao Xiang Zhu, Devis Tuia

Figure 1 for Semantic Segmentation of Remote Sensing Images with Sparse Annotations
Figure 2 for Semantic Segmentation of Remote Sensing Images with Sparse Annotations
Figure 3 for Semantic Segmentation of Remote Sensing Images with Sparse Annotations
Figure 4 for Semantic Segmentation of Remote Sensing Images with Sparse Annotations

Training Convolutional Neural Networks (CNNs) for very high resolution images requires a large quantity of high-quality pixel-level annotations, which is extremely labor- and time-consuming to produce. Moreover, professional photo interpreters might have to be involved for guaranteeing the correctness of annotations. To alleviate such a burden, we propose a framework for semantic segmentation of aerial images based on incomplete annotations, where annotators are asked to label a few pixels with easy-to-draw scribbles. To exploit these sparse scribbled annotations, we propose the FEature and Spatial relaTional regulArization (FESTA) method to complement the supervised task with an unsupervised learning signal that accounts for neighbourhood structures both in spatial and feature terms.

Viaarxiv icon

Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization

Dec 07, 2020
Devis Tuia, Diego Marcos, Gustau Camps-Valls

Figure 1 for Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization
Figure 2 for Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization
Figure 3 for Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization
Figure 4 for Multi-temporal and multi-source remote sensing image classification by nonlinear relative normalization

Remote sensing image classification exploiting multiple sensors is a very challenging problem: data from different modalities are affected by spectral distortions and mis-alignments of all kinds, and this hampers re-using models built for one image to be used successfully in other scenes. In order to adapt and transfer models across image acquisitions, one must be able to cope with datasets that are not co-registered, acquired under different illumination and atmospheric conditions, by different sensors, and with scarce ground references. Traditionally, methods based on histogram matching have been used. However, they fail when densities have very different shapes or when there is no corresponding band to be matched between the images. An alternative builds upon \emph{manifold alignment}. Manifold alignment performs a multidimensional relative normalization of the data prior to product generation that can cope with data of different dimensionality (e.g. different number of bands) and possibly unpaired examples. Aligning data distributions is an appealing strategy, since it allows to provide data spaces that are more similar to each other, regardless of the subsequent use of the transformed data. In this paper, we study a methodology that aligns data from different domains in a nonlinear way through {\em kernelization}. We introduce the Kernel Manifold Alignment (KEMA) method, which provides a flexible and discriminative projection map, exploits only a few labeled samples (or semantic ties) in each domain, and reduces to solving a generalized eigenvalue problem. We successfully test KEMA in multi-temporal and multi-source very high resolution classification tasks, as well as on the task of making a model invariant to shadowing for hyperspectral imaging.

* ISPRS Journal of Photogrammetry and Remote Sensing 120, DOI: 10.1016/j.isprsjprs.2016.07.004  
Viaarxiv icon