EPFL
Abstract:Automated wildlife surveys based on drone imagery and object detection technology are a powerful and increasingly popular tool in conservation biology. Most detectors require training images with annotated bounding boxes, which are tedious, expensive, and not always unambiguous to create. To reduce the annotation load associated with this practice, we develop POLO, a multi-class object detection model that can be trained entirely on point labels. POLO is based on simple, yet effective modifications to the YOLOv8 architecture, including alterations to the prediction process, training losses, and post-processing. We test POLO on drone recordings of waterfowl containing up to multiple thousands of individual birds in one image and compare it to a regular YOLOv8. Our experiments show that at the same annotation cost, POLO achieves improved accuracy in counting animals in aerial imagery.
Abstract:Prototypical part learning is emerging as a promising approach for making semantic segmentation interpretable. The model selects real patches seen during training as prototypes and constructs the dense prediction map based on the similarity between parts of the test image and the prototypes. This improves interpretability since the user can inspect the link between the predicted output and the patterns learned by the model in terms of prototypical information. In this paper, we propose a method for interpretable semantic segmentation that leverages multi-scale image representation for prototypical part learning. First, we introduce a prototype layer that explicitly learns diverse prototypical parts at several scales, leading to multi-scale representations in the prototype activation output. Then, we propose a sparse grouping mechanism that produces multi-scale sparse groups of these scale-specific prototypical parts. This provides a deeper understanding of the interactions between multi-scale object representations while enhancing the interpretability of the segmentation model. The experiments conducted on Pascal VOC, Cityscapes, and ADE20K demonstrate that the proposed method increases model sparsity, improves interpretability over existing prototype-based methods, and narrows the performance gap with the non-interpretable counterpart models. Code is available at github.com/eceo-epfl/ScaleProtoSeg.
Abstract:Reducing speckle fluctuations in multi-channel SAR images is essential in many applications of SAR imaging such as polarimetric classification or interferometric height estimation. While single-channel despeckling has widely benefited from the application of deep learning techniques, extensions to multi-channel SAR images are much more challenging.This paper introduces MuChaPro, a generic framework that exploits existing single-channel despeckling methods. The key idea is to generate numerous single-channel projections, restore these projections, and recombine them into the final multi-channel estimate. This simple approach is shown to be effective in polarimetric and/or interferometric modalities. A special appeal of MuChaPro is the possibility to apply a self-supervised training strategy to learn sensor-specific networks for single-channel despeckling.
Abstract:Vision-language foundation models such as CLIP have shown impressive zero-shot performance on many tasks and datasets, especially thanks to their free-text inputs. However, they struggle to handle some downstream tasks, such as fine-grained attribute detection and localization. In this paper, we propose a multitask fine-tuning strategy based on a positive/negative prompt formulation to further leverage the capacities of the vision-language foundation models. Using the CLIP architecture as baseline, we show strong improvements on bird fine-grained attribute detection and localization tasks, while also increasing the classification performance on the CUB200-2011 dataset. We provide source code for reproducibility purposes: it is available at https://github.com/FactoDeepLearning/MultitaskVLFM.
Abstract:Speckle filtering is generally a prerequisite to the analysis of synthetic aperture radar (SAR) images. Tremendous progress has been achieved in the domain of single-image despeckling. Latest techniques rely on deep neural networks to restore the various structures and textures peculiar to SAR images. The availability of time series of SAR images offers the possibility of improving speckle filtering by combining different speckle realizations over the same area. The supervised training of deep neural networks requires ground-truth speckle-free images. Such images can only be obtained indirectly through some form of averaging, by spatial or temporal integration, and are imperfect. Given the potential of very high quality restoration reachable by multi-temporal speckle filtering, the limitations of ground-truth images need to be circumvented. We extend a recent self-supervised training strategy for single-look complex SAR images, called MERLIN, to the case of multi-temporal filtering. This requires modeling the sources of statistical dependencies in the spatial and temporal dimensions as well as between the real and imaginary components of the complex amplitudes. Quantitative analysis on datasets with simulated speckle indicates a clear improvement of speckle reduction when additional SAR images are included. Our method is then applied to stacks of TerraSAR-X images and shown to outperform competing multi-temporal speckle filtering approaches. The code of the trained models is made freely available on the Gitlab of the IMAGES team of the LTCI Lab, T\'el\'ecom Paris Institut Polytechnique de Paris (https://gitlab.telecom-paris.fr/ring/multi-temporal-merlin/).
Abstract:Reducing speckle and limiting the variations of the physical parameters in Synthetic Aperture Radar (SAR) images is often a key-step to fully exploit the potential of such data. Nowadays, deep learning approaches produce state of the art results in single-image SAR restoration. Nevertheless, huge multi-temporal stacks are now often available and could be efficiently exploited to further improve image quality. This paper explores two fast strategies employing a single-image despeckling algorithm, namely SAR2SAR, in a multi-temporal framework. The first one is based on Quegan filter and replaces the local reflectivity pre-estimation by SAR2SAR. The second one uses SAR2SAR to suppress speckle from a ratio image encoding the multi-temporal information under the form of a "super-image", i.e. the temporal arithmetic mean of a time series. Experimental results on Sentinel-1 GRD data show that these two multi-temporal strategies provide improved filtering results while adding a limited computational cost.
Abstract:Speckle fluctuations seriously limit the interpretability of synthetic aperture radar (SAR) images. Speckle reduction has thus been the subject of numerous works spanning at least four decades. Techniques based on deep neural networks have recently achieved a new level of performance in terms of SAR image restoration quality. Beyond the design of suitable network architectures or the selection of adequate loss functions, the construction of training sets is of uttermost importance. So far, most approaches have considered a supervised training strategy: the networks are trained to produce outputs as close as possible to speckle-free reference images. Speckle-free images are generally not available, which requires resorting to natural or optical images or the selection of stable areas in long time series to circumvent the lack of ground truth. Self-supervision, on the other hand, avoids the use of speckle-free images. We introduce a self-supervised strategy based on the separation of the real and imaginary parts of single-look complex SAR images, called MERLIN (coMplex sElf-supeRvised despeckLINg), and show that it offers a straightforward way to train all kinds of deep despeckling networks. Networks trained with MERLIN take into account the spatial correlations due to the SAR transfer function specific to a given sensor and imaging mode. By requiring only a single image, and possibly exploiting large archives, MERLIN opens the door to hassle-free as well as large-scale training of despeckling networks. The code of the trained models is made freely available at https://gitlab.telecom-paris.fr/RING/MERLIN.
Abstract:Remote sensing provides valuable information about objects or areas from a distance in either active (e.g., RADAR and LiDAR) or passive (e.g., multispectral and hyperspectral) modes. The quality of data acquired by remotely sensed imaging sensors (both active and passive) is often degraded by a variety of noise types and artifacts. Image restoration, which is a vibrant field of research in the remote sensing community, is the task of recovering the true unknown image from the degraded observed image. Each imaging sensor induces unique noise types and artifacts into the observed image. This fact has led to the expansion of restoration techniques in different paths according to each sensor type. This review paper brings together the advances of image restoration techniques with particular focuses on synthetic aperture radar and hyperspectral images as the most active sub-fields of image restoration in the remote sensing community. We, therefore, provide a comprehensive, discipline-specific starting point for researchers at different levels (i.e., students, researchers, and senior researchers) willing to investigate the vibrant topic of data restoration by supplying sufficient detail and references. Additionally, this review paper accompanies a toolbox to provide a platform to encourage interested students and researchers in the field to further explore the restoration techniques and fast-forward the community. The toolboxes are provided in https://github.com/ImageRestorationToolbox.
Abstract:This paper presents a despeckling method for Sentinel-1 GRD images based on the recently proposed framework "SAR2SAR": a self-supervised training strategy. Training the deep neural network on collections of Sentinel 1 GRD images leads to a despeckling algorithm that is robust to space-variant spatial correlations of speckle. Despeckled images improve the detection of structures like narrow rivers. We apply a detector based on exogenous information and a linear features detector and show that rivers are better segmented when the processing chain is applied to images pre-processed by our despeckling neural network.
Abstract:Deep learning approaches show unprecedented results for speckle reduction in SAR amplitude images. The wide availability of multi-temporal stacks of SAR images can improve even further the quality of denoising. In this paper, we propose a flexible yet efficient way to integrate temporal information into a deep neural network for speckle suppression. Archives provide access to long time-series of SAR images, from which multi-temporal averages can be computed with virtually no remaining speckle fluctuations. The proposed method combines this multi-temporal average and the image at a given date in the form of a ratio image and uses a state-of-the-art neural network to remove the speckle in this ratio image. This simple strategy is shown to offer a noticeable improvement compared to filtering the original image without knowledge of the multi-temporal average.