Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elliot Vincent

CoDEx: Combining Domain Expertise for Spatial Generalization in Satellite Image Analysis

Apr 28, 2025

Abhishek Kuriyal, Elliot Vincent, Mathieu Aubry, Loic Landrieu

Abstract:Global variations in terrain appearance raise a major challenge for satellite image analysis, leading to poor model performance when training on locations that differ from those encountered at test time. This remains true even with recent large global datasets. To address this challenge, we propose a novel domain-generalization framework for satellite images. Instead of trying to learn a single generalizable model, we train one expert model per training domain, while learning experts' similarity and encouraging similar experts to be consistent. A model selection module then identifies the most suitable experts for a given test sample and aggregates their predictions. Experiments on four datasets (DynamicEarthNet, MUDS, OSCD, and FMoW) demonstrate consistent gains over existing domain generalization and adaptation methods. Our code is publicly available at https://github.com/Abhishek19009/CoDEx.

* CVPR 2025 EarthVision Workshop

Via

Access Paper or Ask Questions

Detecting Looted Archaeological Sites from Satellite Image Time Series

Sep 14, 2024

Elliot Vincent, Mehraïl Saroufim, Jonathan Chemla, Yves Ubelmann, Philippe Marquis, Jean Ponce, Mathieu Aubry

Figure 1 for Detecting Looted Archaeological Sites from Satellite Image Time Series

Figure 2 for Detecting Looted Archaeological Sites from Satellite Image Time Series

Figure 3 for Detecting Looted Archaeological Sites from Satellite Image Time Series

Figure 4 for Detecting Looted Archaeological Sites from Satellite Image Time Series

Abstract:Archaeological sites are the physical remains of past human activity and one of the main sources of information about past societies and cultures. However, they are also the target of malevolent human actions, especially in countries having experienced inner turmoil and conflicts. Because monitoring these sites from space is a key step towards their preservation, we introduce the DAFA Looted Sites dataset, \datasetname, a labeled multi-temporal remote sensing dataset containing 55,480 images acquired monthly over 8 years across 675 Afghan archaeological sites, including 135 sites looted during the acquisition period. \datasetname~is particularly challenging because of the limited number of training samples, the class imbalance, the weak binary annotations only available at the level of the time series, and the subtlety of relevant changes coupled with important irrelevant ones over a long time period. It is also an interesting playground to assess the performance of satellite image time series (SITS) classification methods on a real and important use case. We evaluate a large set of baselines, outline the substantial benefits of using foundation models and show the additional boost that can be provided by using complete time series instead of using a single image.

Via

Access Paper or Ask Questions

Historical Printed Ornaments: Dataset and Tasks

Aug 16, 2024

Sayan Kumar Chaki, Zeynep Sonat Baltaci, Elliot Vincent, Remi Emonet, Fabienne Vial-Bonacci, Christelle Bahier-Porte, Mathieu Aubry, Thierry Fournel

Abstract:This paper aims to develop the study of historical printed ornaments with modern unsupervised computer vision. We highlight three complex tasks that are of critical interest to book historians: clustering, element discovery, and unsupervised change localization. For each of these tasks, we introduce an evaluation benchmark, and we adapt and evaluate state-of-the-art models. Our Rey's Ornaments dataset is designed to be a representative example of a set of ornaments historians would be interested in. It focuses on an XVIIIth century bookseller, Marc-Michel Rey, providing a consistent set of ornaments with a wide diversity and representative challenges. Our results highlight the limitations of state-of-the-art models when faced with real data and show simple baselines such as k-means or congealing can outperform more sophisticated approaches on such data. Our dataset and code can be found at https://printed-ornaments.github.io/.

Via

Access Paper or Ask Questions

Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift

Jul 10, 2024

Elliot Vincent, Jean Ponce, Mathieu Aubry

Abstract:Satellite imagery plays a crucial role in monitoring changes happening on Earth's surface and aiding in climate analysis, ecosystem assessment, and disaster response. In this paper, we tackle semantic change detection with satellite image time series (SITS-SCD) which encompasses both change detection and semantic segmentation tasks. We propose a new architecture that improves over the state of the art, scales better with the number of parameters, and leverages long-term temporal information. However, for practical use cases, models need to adapt to spatial and temporal shifts, which remains a challenge. We investigate the impact of temporal and spatial shifts separately on global, multi-year SITS datasets using DynamicEarthNet and MUDS. We show that the spatial domain shift represents the most complex setting and that the impact of temporal shift on performance is more pronounced on change detection than on semantic segmentation, highlighting that it is a specific issue deserving further attention.

Via

Access Paper or Ask Questions

OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Apr 29, 2024

Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis, Constantin Aronssohn, Nacim Bouia, Stephanie Fu, Romain Loiseau, Van Nguyen Nguyen, Charles Raude, Elliot Vincent(+3 more)

Figure 1 for OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Figure 2 for OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Figure 3 for OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Figure 4 for OpenStreetView-5M: The Many Roads to Global Visual Geolocation

Abstract:Determining the location of an image anywhere on Earth is a complex visual task, which makes it particularly relevant for evaluating computer vision algorithms. Yet, the absence of standard, large-scale, open-access datasets with reliably localizable images has limited its potential. To address this issue, we introduce OpenStreetView-5M, a large-scale, open-access dataset comprising over 5.1 million geo-referenced street view images, covering 225 countries and territories. In contrast to existing benchmarks, we enforce a strict train/test separation, allowing us to evaluate the relevance of learned geographical features beyond mere memorization. To demonstrate the utility of our dataset, we conduct an extensive benchmark of various state-of-the-art image encoders, spatial representations, and training strategies. All associated codes and models can be found at https://github.com/gastruc/osv5m.

* CVPR 2024

Via

Access Paper or Ask Questions

Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Apr 19, 2023

Romain Loiseau, Elliot Vincent, Mathieu Aubry, Loic Landrieu

Figure 1 for Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Figure 2 for Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Figure 3 for Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Figure 4 for Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans

Abstract:We propose an unsupervised method for parsing large 3D scans of real-world scenes into interpretable parts. Our goal is to provide a practical tool for analyzing 3D scenes with unique characteristics in the context of aerial surveying and mapping, without relying on application-specific user annotations. Our approach is based on a probabilistic reconstruction model that decomposes an input 3D point cloud into a small set of learned prototypical shapes. Our model provides an interpretable reconstruction of complex scenes and leads to relevant instance and semantic segmentations. To demonstrate the usefulness of our results, we introduce a novel dataset of seven diverse aerial LiDAR scans. We show that our method outperforms state-of-the-art unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. Our method offers significant advantage over existing approaches, as it does not require any manual annotations, making it a practical and efficient tool for 3D scene analysis. Our code and dataset are available at https://imagine.enpc.fr/~loiseaur/learnable-earth-parser

Via

Access Paper or Ask Questions

Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach

Mar 22, 2023

Elliot Vincent, Jean Ponce, Mathieu Aubry

Abstract:Improvements in Earth observation by satellites allow for imagery of ever higher temporal and spatial resolution. Leveraging this data for agricultural monitoring is key for addressing environmental and economic challenges. Current methods for crop segmentation using temporal data either rely on annotated data or are heavily engineered to compensate the lack of supervision. In this paper, we present and compare datasets and methods for both supervised and unsupervised pixel-wise segmentation of satellite image time series (SITS). We also introduce an approach to add invariance to spectral deformations and temporal shifts to classical prototype-based methods such as K-means and Nearest Centroid Classifier (NCC). We show this simple and highly interpretable method leads to meaningful results in both the supervised and unsupervised settings and significantly improves the state of the art for unsupervised classification of agricultural time series on four recent SITS datasets.

Via

Access Paper or Ask Questions

A Model You Can Hear: Audio Identification with Playable Prototypes

Aug 05, 2022

Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loic Landrieu

Figure 1 for A Model You Can Hear: Audio Identification with Playable Prototypes

Figure 2 for A Model You Can Hear: Audio Identification with Playable Prototypes

Figure 3 for A Model You Can Hear: Audio Identification with Playable Prototypes

Figure 4 for A Model You Can Hear: Audio Identification with Playable Prototypes

Abstract:Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

Via

Access Paper or Ask Questions

Unsupervised Layered Image Decomposition into Object Prototypes

Apr 29, 2021

Tom Monnier, Elliot Vincent, Jean Ponce, Mathieu Aubry

Figure 1 for Unsupervised Layered Image Decomposition into Object Prototypes

Figure 2 for Unsupervised Layered Image Decomposition into Object Prototypes

Figure 3 for Unsupervised Layered Image Decomposition into Object Prototypes

Figure 4 for Unsupervised Layered Image Decomposition into Object Prototypes

Abstract:We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models. Contrary to recent approaches that model image layers with autoencoder networks, we represent them as explicit transformations of a small set of prototypical images. Our model has three main components: (i) a set of object prototypes in the form of learnable images with a transparency channel, which we refer to as sprites; (ii) differentiable parametric functions predicting occlusions and transformation parameters necessary to instantiate the sprites in a given image; (iii) a layered image formation model with occlusion for compositing these instances into complete images including background. By jointly learning the sprites and occlusion/transformation predictors to reconstruct images, our approach not only yields accurate layered image decompositions, but also identifies object categories and instance parameters. We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks (Tetrominoes, Multi-dSprites, CLEVR6). We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images. To the best of our knowledge, our approach is the first layered image decomposition algorithm that learns an explicit and shared concept of object type, and is robust enough to be applied to real images.

* Project webpage: https://imagine.enpc.fr/~monniert/DTI-Sprites

Via

Access Paper or Ask Questions