Alert button
Picture for Lorenzo Bruzzone

Lorenzo Bruzzone

Alert button

OBSUM: An object-based spatial unmixing model for spatiotemporal fusion of remote sensing images

Oct 14, 2023
Houcai Guo, Dingqi Ye, Lorenzo Bruzzone

Spatiotemporal fusion aims to improve both the spatial and temporal resolution of remote sensing images, thus facilitating time-series analysis at a fine spatial scale. However, there are several important issues that limit the application of current spatiotemporal fusion methods. First, most spatiotemporal fusion methods are based on pixel-level computation, which neglects the valuable object-level information of the land surface. Moreover, many existing methods cannot accurately retrieve strong temporal changes between the available high-resolution image at base date and the predicted one. This study proposes an Object-Based Spatial Unmixing Model (OBSUM), which incorporates object-based image analysis and spatial unmixing, to overcome the two abovementioned problems. OBSUM consists of one preprocessing step and three fusion steps, i.e., object-level unmixing, object-level residual compensation, and pixel-level residual compensation. OBSUM can be applied using only one fine image at the base date and one coarse image at the prediction date, without the need of a coarse image at the base date. The performance of OBSUM was compared with five representative spatiotemporal fusion methods. The experimental results demonstrated that OBSUM outperformed other methods in terms of both accuracy indices and visual effects over time-series. Furthermore, OBSUM also achieved satisfactory results in two typical remote sensing applications. Therefore, it has great potential to generate accurate and high-resolution time-series observations for supporting various remote sensing applications.

Viaarxiv icon

Incomplete Multimodal Learning for Remote Sensing Data Fusion

Apr 22, 2023
Yuxing Chen, Maofan Zhao, Lorenzo Bruzzone

Figure 1 for Incomplete Multimodal Learning for Remote Sensing Data Fusion
Figure 2 for Incomplete Multimodal Learning for Remote Sensing Data Fusion
Figure 3 for Incomplete Multimodal Learning for Remote Sensing Data Fusion
Figure 4 for Incomplete Multimodal Learning for Remote Sensing Data Fusion

The mechanism of connecting multimodal signals through self-attention operation is a key factor in the success of multimodal Transformer networks in remote sensing data fusion tasks. However, traditional approaches assume access to all modalities during both training and inference, which can lead to severe degradation when dealing with modal-incomplete inputs in downstream applications. To address this limitation, our proposed approach introduces a novel model for incomplete multimodal learning in the context of remote sensing data fusion. This approach can be used in both supervised and self-supervised pretraining paradigms and leverages the additional learned fusion tokens in combination with Bi-LSTM attention and masked self-attention mechanisms to collect multimodal signals. The proposed approach employs reconstruction and contrastive loss to facilitate fusion in pre-training while allowing for random modality combinations as inputs in network training. Our approach delivers state-of-the-art performance on two multimodal datasets for tasks such as building instance / semantic segmentation and land-cover mapping tasks when dealing with incomplete inputs during inference.

Viaarxiv icon

Unsupervised CD in satellite image time series by contrastive learning and feature tracking

Apr 22, 2023
Yuxing Chen, Lorenzo Bruzzone

Figure 1 for Unsupervised CD in satellite image time series by contrastive learning and feature tracking
Figure 2 for Unsupervised CD in satellite image time series by contrastive learning and feature tracking
Figure 3 for Unsupervised CD in satellite image time series by contrastive learning and feature tracking
Figure 4 for Unsupervised CD in satellite image time series by contrastive learning and feature tracking

While unsupervised change detection using contrastive learning has been significantly improved the performance of literature techniques, at present, it only focuses on the bi-temporal change detection scenario. Previous state-of-the-art models for image time-series change detection often use features obtained by learning for clustering or training a model from scratch using pseudo labels tailored to each scene. However, these approaches fail to exploit the spatial-temporal information of image time-series or generalize to unseen scenarios. In this work, we propose a two-stage approach to unsupervised change detection in satellite image time-series using contrastive learning with feature tracking. By deriving pseudo labels from pre-trained models and using feature tracking to propagate them among the image time-series, we improve the consistency of our pseudo labels and address the challenges of seasonal changes in long-term remote sensing image time-series. We adopt the self-training algorithm with ConvLSTM on the obtained pseudo labels, where we first use supervised contrastive loss and contrastive random walks to further improve the feature correspondence in space-time. Then a fully connected layer is fine-tuned on the pre-trained multi-temporal features for generating the final change maps. Through comprehensive experiments on two datasets, we demonstrate consistent improvements in accuracy on fitting and inference scenarios.

Viaarxiv icon

Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Dec 17, 2022
Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone

Figure 1 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
Figure 2 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
Figure 3 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
Figure 4 for Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images

Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs). This is more meaningful than Binary Change Detection (BCD) since it enables detailed change analysis in the observed areas. Previous works established triple-branch Convolutional Neural Network (CNN) architectures as the paradigm for SCD. However, it remains challenging to exploit semantic information with a limited amount of change samples. In this work, we investigate to jointly consider the spatio-temporal dependencies to improve the accuracy of SCD. First, we propose a Semantic Change Transformer (SCanFormer) to explicitly model the 'from-to' semantic transitions between the bi-temporal RSIs. Then, we introduce a semantic learning scheme to leverage the spatio-temporal constraints, which are coherent to the SCD task, to guide the learning of semantic changes. The resulting network (SCanNet) significantly outperforms the baseline method in terms of both detection of critical semantic changes and semantic consistency in the obtained bi-temporal results. It achieves the SOTA accuracy on two benchmark datasets for the SCD.

Viaarxiv icon

Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images

Aug 28, 2021
Lei Ding, Haitao Guo, Sicong Liu, Lichao Mou, Jing Zhang, Lorenzo Bruzzone

Figure 1 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images
Figure 2 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images
Figure 3 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images
Figure 4 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images

Semantic change detection (SCD) extends the multi-class change detection (MCD) task to provide not only the change locations but also the detailed land-cover/land-use (LCLU) categories before and after the observation intervals. This fine-grained semantic change information is very useful in many applications. Recent studies indicate that the SCD can be modeled through a triple-branch Convolutional Neural Network (CNN), which contains two temporal branches and a change branch. However, in this architecture, the communications between the temporal branches and the change branch are insufficient. To overcome the limitations in existing methods, we propose a novel CNN architecture for the SCD, where the semantic temporal features are merged in a deep CD unit. Furthermore, we elaborate on this architecture to reason the bi-temporal semantic correlations. The resulting Bi-temporal Semantic Reasoning Network (Bi-SRNet) contains two types of semantic reasoning blocks to reason both single-temporal and cross-temporal semantic correlations, as well as a novel loss function to improve the semantic consistency of change detection results. Experimental results on a benchmark dataset show that the proposed architecture obtains significant accuracy improvements over the existing approaches, while the added designs in the Bi-SRNet further improves the segmentation of both semantic categories and the changed areas. The codes in this paper are accessible at: github.com/ggsDing/Bi-SRNet.

* Manuscript to ISPRS Journal of Photogrammetry and Remote Sensing 
Viaarxiv icon

Bi-Temporal Semantic Reasoning for the Semantic Change Detection of HR Remote Sensing Images

Aug 25, 2021
Lei Ding, Haitao Guo, Sicong Liu, Lichao Mou, Jing Zhang, Lorenzo Bruzzone

Figure 1 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection of HR Remote Sensing Images
Figure 2 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection of HR Remote Sensing Images
Figure 3 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection of HR Remote Sensing Images
Figure 4 for Bi-Temporal Semantic Reasoning for the Semantic Change Detection of HR Remote Sensing Images

Semantic change detection (SCD) extends the multi-class change detection (MCD) task to provide not only the change locations but also the detailed land-cover/land-use (LC/LU) categories before and after the observation intervals. This fine-grained semantic change information is very useful in many applications. Recent studies indicate that the SCD can be modeled through a triple-branch Convolutional Neural Network (CNN), which contains two temporal branches and a change branch. However, in this architecture, the communications between the temporal branches and the change branch are in-sufficient. To overcome the limitations in existing methods, we propose a novel CNN architecture for the SCD, where the semantic temporal features are merged in a deep CD unit. Furthermore, we elaborate on this architecture to reason the bi-temporal semantic correlations. The resulting Bi-temporal Semantic Reasoning Network (Bi-SRNet) contains two types of semantic reasoning blocks to reason both single-temporal and cross-temporal semantic correlations, as well as a novel loss function to improve the semantic consistency of change detection results. Experimental results on a benchmark dataset show that the proposed architecture obtains significant accuracy improvements over the existing approaches, while the added designs in the Bi-SRNet further improves the segmentation of both semantic categories and the changed areas. The codes in this paper are accessible at: https://github.com/ggsDing/Bi-SRNet

* 12 pages, 7 figures. Manuscript to ISPRS Journal of Photogrammetry and Remote Sensin 
Viaarxiv icon

Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

Jul 14, 2021
Lei Ding, Dong Lin, Shaofu Lin, Jing Zhang, Xiaojie Cui, Yuebin Wang, Hao Tang, Lorenzo Bruzzone

Figure 1 for Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
Figure 2 for Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
Figure 3 for Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
Figure 4 for Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

Long-range context information is crucial for the semantic segmentation of High-Resolution (HR) Remote Sensing Images (RSIs). The image cropping operations, commonly used for training neural networks, limit the perception of long-range context information in large RSIs. To break this limitation, we propose a Wide-Context Network (WiCoNet) for the semantic segmentation of HR RSIs. In the WiCoNet, apart from a conventional feature extraction network that aggregates the local information, an extra context branch is designed to explicitly model the spatial information in a larger image area. The information between the two branches is communicated through a Context Transformer, which is a novel design derived from the Vision Transformer to model the long-range context correlations. Ablation studies and comparative experiments conducted on several benchmark datasets prove the effectiveness of the proposed method. In addition, we present a new Beijing Land-Use (BLU) dataset. This is a large-scale HR satellite dataset provided with high-quality and fine-grained reference labels, which can boost future studies in this field.

Viaarxiv icon

Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

Jul 01, 2021
Lei Ding, Dong Lin, Shaofu Lin, Jing Zhang, Xiaojie Cui, Yuebin Wang, Hao Tang, Lorenzo Bruzzone

Figure 1 for Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
Figure 2 for Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
Figure 3 for Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
Figure 4 for Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

Long-range context information is crucial for the semantic segmentation of High-Resolution (HR) Remote Sensing Images (RSIs). The image cropping operations, commonly used for training neural networks, limit the perception of long-range context information in large RSIs. To break this limitation, we propose a Wider-Context Network (WiCNet) for the semantic segmentation of HR RSIs. In the WiCNet, apart from a conventional feature extraction network to aggregate the local information, an extra context branch is designed to explicitly model the context information in a larger image area. The information between the two branches is communicated through a Context Transformer, which is a novel design derived from the Vision Transformer to model the long-range context correlations. Ablation studies and comparative experiments conducted on several benchmark datasets prove the effectiveness of the proposed method. Additionally, we present a new Beijing Land-Use (BLU) dataset. This is a large-scale HR satellite dataset provided with high-quality and fine-grained reference labels, which we hope will boost future studies in this field.

Viaarxiv icon