Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao Xiang Zhu

Technical University of Munich

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

May 24, 2023

Zhitong Xiong, Sining Chen, Yi Wang, Lichao Mou, Xiao Xiang Zhu

Abstract:Geometric information in the normalized digital surface models (nDSM) is highly correlated with the semantic class of the land cover. Exploiting two modalities (RGB and nDSM (height)) jointly has great potential to improve the segmentation performance. However, it is still an under-explored field in remote sensing due to the following challenges. First, the scales of existing datasets are relatively small and the diversity of existing datasets is limited, which restricts the ability of validation. Second, there is a lack of unified benchmarks for performance assessment, which leads to difficulties in comparing the effectiveness of different models. Last, sophisticated multi-modal semantic segmentation methods have not been deeply explored for remote sensing data. To cope with these challenges, in this paper, we introduce a new remote-sensing benchmark dataset for multi-modal semantic segmentation based on RGB-Height (RGB-H) data. Towards a fair and comprehensive analysis of existing methods, the proposed benchmark consists of 1) a large-scale dataset including co-registered RGB and nDSM pairs and pixel-wise semantic labels; 2) a comprehensive evaluation and analysis of existing multi-modal fusion strategies for both convolutional and Transformer-based networks on remote sensing data. Furthermore, we propose a novel and effective Transformer-based intermediary multi-modal fusion (TIMF) module to improve the semantic segmentation performance through adaptive token-level multi-modal fusion.The designed benchmark can foster future research on developing new methods for multi-modal learning on remote sensing data. Extensive analyses of those methods are conducted and valuable insights are provided through the experimental results. Code for the benchmark and baselines can be accessed at \url{https://github.com/EarthNets/RSI-MMSegmentation}.

* 13 pages

Via

Access Paper or Ask Questions

Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography

May 23, 2023

Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

Abstract:Finding sparse solutions of underdetermined linear systems commonly requires the solving of L1 regularized least squares minimization problem, which is also known as the basis pursuit denoising (BPDN). They are computationally expensive since they cannot be solved analytically. An emerging technique known as deep unrolling provided a good combination of the descriptive ability of neural networks, explainable, and computational efficiency for BPDN. Many unrolled neural networks for BPDN, e.g. learned iterative shrinkage thresholding algorithm and its variants, employ shrinkage functions to prune elements with small magnitude. Through experiments on synthetic aperture radar tomography (TomoSAR), we discover the shrinkage step leads to unavoidable information loss in the dynamics of networks and degrades the performance of the model. We propose a recurrent neural network (RNN) with novel sparse minimal gated units (SMGUs) to solve the information loss issue. The proposed RNN architecture with SMGUs benefits from incorporating historical information into optimization, and thus effectively preserves full information in the final output. Taking TomoSAR inversion as an example, extensive simulations demonstrated that the proposed RNN outperforms the state-of-the-art deep learning-based algorithm in terms of super-resolution power as well as generalization ability. It achieved a 10% to 20% higher double scatterers detection rate and is less sensitive to phase and amplitude ratio differences between scatterers. Test on real TerraSAR-X spotlight images also shows a high-quality 3-D reconstruction of the test site.

Via

Access Paper or Ask Questions

Artificial intelligence to advance Earth observation: a perspective

May 15, 2023

Devis Tuia, Konrad Schindler, Begüm Demir, Gustau Camps-Valls, Xiao Xiang Zhu, Mrinalini Kochupillai, Sašo Džeroski, Jan N. van Rijn, Holger H. Hoos, Fabio Del Frate(+5 more)

Abstract:Earth observation (EO) is a prime instrument for monitoring land and ocean processes, studying the dynamics at work, and taking the pulse of our planet. This article gives a bird's eye view of the essential scientific tools and approaches informing and supporting the transition from raw EO data to usable EO-based information. The promises, as well as the current challenges of these developments, are highlighted under dedicated sections. Specifically, we cover the impact of (i) Computer vision; (ii) Machine learning; (iii) Advanced processing and computing; (iv) Knowledge-based AI; (v) Explainable AI and causal inference; (vi) Physics-aware models; (vii) User-centric approaches; and (viii) the much-needed discussion of ethical and societal issues related to the massive use of ML technologies in EO.

Via

Access Paper or Ask Questions

Vision-Language Models in Remote Sensing: Current Progress and Future Trends

May 09, 2023

Congcong Wen, Yuan Hu, Xiang Li, Zhenghang Yuan, Xiao Xiang Zhu

Figure 1 for Vision-Language Models in Remote Sensing: Current Progress and Future Trends

Figure 2 for Vision-Language Models in Remote Sensing: Current Progress and Future Trends

Figure 3 for Vision-Language Models in Remote Sensing: Current Progress and Future Trends

Figure 4 for Vision-Language Models in Remote Sensing: Current Progress and Future Trends

Abstract:The remarkable achievements of ChatGPT and GPT-4 have sparked a wave of interest and research in the field of large language models for Artificial General Intelligence (AGI). These models provide us with intelligent solutions that are more similar to human thinking, enabling us to use general artificial intelligence to solve problems in various applications. However, in the field of remote sensing, the scientific literature on the implementation of AGI remains relatively scant. Existing AI-related research primarily focuses on visual understanding tasks while neglecting the semantic understanding of the objects and their relationships. This is where vision-language models excel, as they enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics. Vision-language models can go beyond recognizing the objects in an image and can infer the relationships between them, as well as generate natural language descriptions of the image. This makes them better suited for tasks that require both visual and textual understanding, such as image captioning, text-based image retrieval, and visual question answering. This paper provides a comprehensive review of the research on vision-language models in remote sensing, summarizing the latest progress, highlighting the current challenges, and identifying potential research opportunities. Specifically, we review the application of vision-language models in several mainstream remote sensing tasks, including image captioning, text-based image generation, text-based image retrieval, visual question answering, scene classification, semantic segmentation, and object detection. For each task, we briefly describe the task background and review some representative works. Finally, we summarize the limitations of existing work and provide some possible directions for future development.

Via

Access Paper or Ask Questions

High Quality Large-Scale 3-D Urban Mapping with Multi-Master TomoSAR

May 08, 2023

Yilei Shi, Richard Bamler, Yuanyuan Wang, Xiao Xiang Zhu

Abstract:Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use large-stack interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban mapping using TanDEM-X data where only a few acquisitions are available in average for each city. This work proposes a new SAR tomographic processing framework to work with those extremely small stacks, which integrates the non-local filtering into SAR tomography inversion. The applicability of the algorithm is demonstrated using a TanDEM-X multi-baseline stack with 5 bistatic interferograms over the whole city of Munich, Germany. Systematic comparison of our result with airborne LiDAR data shows that the relative height accuracy of two third buildings is within two meters, which outperforms the TanDEM-X raw DEM. The promising performance of the proposed algorithm paved the first step towards high quality large-scale 3-D urban mapping.

* 7 pages. arXiv admin note: substantial text overlap with arXiv:2003.07803

Via

Access Paper or Ask Questions

Deep Unsupervised Learning for 3D ALS Point Clouds Change Detection

May 05, 2023

Iris de Gélis, Sudipan Saha, Muhammad Shahzad, Thomas Corpetti, Sébastien Lefèvre, Xiao Xiang Zhu

Figure 1 for Deep Unsupervised Learning for 3D ALS Point Clouds Change Detection

Figure 2 for Deep Unsupervised Learning for 3D ALS Point Clouds Change Detection

Figure 3 for Deep Unsupervised Learning for 3D ALS Point Clouds Change Detection

Figure 4 for Deep Unsupervised Learning for 3D ALS Point Clouds Change Detection

Abstract:Change detection from traditional optical images has limited capability to model the changes in the height or shape of objects. Change detection using 3D point cloud aerial LiDAR survey data can fill this gap by providing critical depth information. While most existing machine learning based 3D point cloud change detection methods are supervised, they severely depend on the availability of annotated training data, which is in practice a critical point. To circumnavigate this dependence, we propose an unsupervised 3D point cloud change detection method mainly based on self-supervised learning using deep clustering and contrastive learning. The proposed method also relies on an adaptation of deep change vector analysis to 3D point cloud via nearest point comparison. Experiments conducted on a publicly available real dataset show that the proposed method obtains higher performance in comparison to the traditional unsupervised methods, with a gain of about 9% in mean accuracy (to reach more than 85%). Thus, it appears to be a relevant choice in scenario where prior knowledge (labels) is not ensured.

* This work has been submitted to Elsevier for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

UnCRtainTS: Uncertainty Quantification for Cloud Removal in Optical Satellite Time Series

Apr 11, 2023

Patrick Ebel, Vivien Sainte Fare Garnot, Michael Schmitt, Jan Dirk Wegner, Xiao Xiang Zhu

Abstract:Clouds and haze often occlude optical satellite images, hindering continuous, dense monitoring of the Earth's surface. Although modern deep learning methods can implicitly learn to ignore such occlusions, explicit cloud removal as pre-processing enables manual interpretation and allows training models when only few annotations are available. Cloud removal is challenging due to the wide range of occlusion scenarios -- from scenes partially visible through haze, to completely opaque cloud coverage. Furthermore, integrating reconstructed images in downstream applications would greatly benefit from trustworthy quality assessment. In this paper, we introduce UnCRtainTS, a method for multi-temporal cloud removal combining a novel attention-based architecture, and a formulation for multivariate uncertainty prediction. These two components combined set a new state-of-the-art performance in terms of image reconstruction on two public cloud removal datasets. Additionally, we show how the well-calibrated predicted uncertainties enable a precise control of the reconstruction quality.

Via

Access Paper or Ask Questions

Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Apr 07, 2023

Zhenghang Yuan, Lichao Mou, Xiao Xiang Zhu

Figure 1 for Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Figure 2 for Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Figure 3 for Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Figure 4 for Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Abstract:Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.

* This paper was submitted to the JURSE 2023 conference on November 5, 2022

Via

Access Paper or Ask Questions

Universal Domain Adaptation for Remote Sensing Image Scene Classification

Jan 26, 2023

Qingsong Xu, Yilei Shi, Xin Yuan, Xiao Xiang Zhu

Abstract:The domain adaptation (DA) approaches available to date are usually not well suited for practical DA scenarios of remote sensing image classification, since these methods (such as unsupervised DA) rely on rich prior knowledge about the relationship between label sets of source and target domains, and source data are often not accessible due to privacy or confidentiality issues. To this end, we propose a practical universal domain adaptation setting for remote sensing image scene classification that requires no prior knowledge on the label sets. Furthermore, a novel universal domain adaptation method without source data is proposed for cases when the source data is unavailable. The architecture of the model is divided into two parts: the source data generation stage and the model adaptation stage. The first stage estimates the conditional distribution of source data from the pre-trained model using the knowledge of class-separability in the source domain and then synthesizes the source data. With this synthetic source data in hand, it becomes a universal DA task to classify a target sample correctly if it belongs to any category in the source label set, or mark it as ``unknown" otherwise. In the second stage, a novel transferable weight that distinguishes the shared and private label sets in each domain promotes the adaptation in the automatically discovered shared label set and recognizes the ``unknown'' samples successfully. Empirical results show that the proposed model is effective and practical for remote sensing image scene classification, regardless of whether the source data is available or not. The code is available at https://github.com/zhu-xlab/UniDA.

* 15 pages, 6 figures, IEEE Transactions on Geoscience and Remote Sensing

Via

Access Paper or Ask Questions

High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Jan 09, 2023

Fang Xu, Yilei Shi, Patrick Ebel, Wen Yang, Xiao Xiang Zhu

Figure 1 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Figure 2 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Figure 3 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Figure 4 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Abstract:In this paper, we introduce Planet-CR, a benchmark dataset for high-resolution cloud removal with multi-modal and multi-resolution data fusion. Planet-CR is the first public dataset for cloud removal to feature globally sampled high resolution optical observations, in combination with paired radar measurements as well as pixel-level land cover annotations. It provides solid basis for exhaustive evaluation in terms of generating visually pleasing textures and semantically meaningful structures. With this dataset, we consider the problem of cloud removal in high resolution optical remote sensing imagery by integrating multi-modal and multi-resolution information. Existing multi-modal data fusion based methods, which assume the image pairs are aligned pixel-to-pixel, are hence not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution SAR image guided high-resolution optical image cloud removal. It implicitly aligns the multi-modal and multi-resolution data during the reconstruction process to promote the cloud removal performance. The experimental results demonstrate that the proposed Align-CR method gives the best performance in both visual recovery quality and semantic recovery quality. The project is available at https://github.com/zhu-xlab/Planet-CR, and hope this will inspire future research.

Via

Access Paper or Ask Questions