Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liangpei Zhang

Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

Jan 17, 2024

Haonan Guo, Xin Su, Chen Wu, Bo Du, Liangpei Zhang, Deren Li

Figure 1 for Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

Figure 2 for Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

Abstract:Recently, the flourishing large language models(LLM), especially ChatGPT, have shown exceptional performance in language understanding, reasoning, and interaction, attracting users and researchers from multiple fields and domains. Although LLMs have shown great capacity to perform human-like task accomplishment in natural language and natural image, their potential in handling remote sensing interpretation tasks has not yet been fully explored. Moreover, the lack of automation in remote sensing task planning hinders the accessibility of remote sensing interpretation techniques, especially to non-remote sensing experts from multiple research fields. To this end, we present Remote Sensing ChatGPT, an LLM-powered agent that utilizes ChatGPT to connect various AI-based remote sensing models to solve complicated interpretation tasks. More specifically, given a user request and a remote sensing image, we utilized ChatGPT to understand user requests, perform task planning according to the tasks' functions, execute each subtask iteratively, and generate the final response according to the output of each subtask. Considering that LLM is trained with natural language and is not capable of directly perceiving visual concepts as contained in remote sensing images, we designed visual cues that inject visual information into ChatGPT. With Remote Sensing ChatGPT, users can simply send a remote sensing image with the corresponding request, and get the interpretation results as well as language feedback from Remote Sensing ChatGPT. Experiments and examples show that Remote Sensing ChatGPT can tackle a wide range of remote sensing tasks and can be extended to more tasks with more sophisticated models such as the remote sensing foundation model. The code and demo of Remote Sensing ChatGPT is publicly available at https://github.com/HaonanGuo/Remote-Sensing-ChatGPT .

* The manuscript is submitted to IEEE International Geoscience and Remote Sensing Symposium(IGARSS2024). Looking forward to seeing you in July!

Via

Access Paper or Ask Questions

Deep Blind Super-Resolution for Satellite Video

Jan 13, 2024

Yi Xiao, Qiangqiang Yuan, Qiang Zhang, Liangpei Zhang

Figure 1 for Deep Blind Super-Resolution for Satellite Video

Figure 2 for Deep Blind Super-Resolution for Satellite Video

Figure 3 for Deep Blind Super-Resolution for Satellite Video

Figure 4 for Deep Blind Super-Resolution for Satellite Video

Abstract:Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mainly engaged in blur kernel estimation while losing sight of another critical aspect for VSR tasks: temporal compensation, especially compensating for blurry and smooth pixels with vital sharpness from severely degraded satellite videos. Therefore, this paper proposes a practical Blind SVSR algorithm (BSVSR) to explore more sharp cues by considering the pixel-wise blur levels in a coarse-to-fine manner. Specifically, we employed multi-scale deformable convolution to coarsely aggregate the temporal redundancy into adjacent frames by window-slid progressive fusion. Then the adjacent features are finely merged into mid-feature using deformable attention, which measures the blur levels of pixels and assigns more weights to the informative pixels, thus inspiring the representation of sharpness. Moreover, we devise a pyramid spatial transformation module to adjust the solution space of sharp mid-feature, resulting in flexible feature adaptation in multi-level domains. Quantitative and qualitative evaluations on both simulated and real-world satellite videos demonstrate that our BSVSR performs favorably against state-of-the-art non-blind and blind SR models. Code will be available at https://github.com/XY-boy/Blind-Satellite-VSR

* IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-16, 2023, Art no. 5516316
* Published in IEEE TGRS

Via

Access Paper or Ask Questions

Cross-Scope Spatial-Spectral Information Aggregation for Hyperspectral Image Super-Resolution

Nov 29, 2023

Shi Chen, Lefei Zhang, Liangpei Zhang

Figure 1 for Cross-Scope Spatial-Spectral Information Aggregation for Hyperspectral Image Super-Resolution

Figure 2 for Cross-Scope Spatial-Spectral Information Aggregation for Hyperspectral Image Super-Resolution

Figure 3 for Cross-Scope Spatial-Spectral Information Aggregation for Hyperspectral Image Super-Resolution

Figure 4 for Cross-Scope Spatial-Spectral Information Aggregation for Hyperspectral Image Super-Resolution

Abstract:Hyperspectral image super-resolution has attained widespread prominence to enhance the spatial resolution of hyperspectral images. However, convolution-based methods have encountered challenges in harnessing the global spatial-spectral information. The prevailing transformer-based methods have not adequately captured the long-range dependencies in both spectral and spatial dimensions. To alleviate this issue, we propose a novel cross-scope spatial-spectral Transformer (CST) to efficiently investigate long-range spatial and spectral similarities for single hyperspectral image super-resolution. Specifically, we devise cross-attention mechanisms in spatial and spectral dimensions to comprehensively model the long-range spatial-spectral characteristics. By integrating global information into the rectangle-window self-attention, we first design a cross-scope spatial self-attention to facilitate long-range spatial interactions. Then, by leveraging appropriately characteristic spatial-spectral features, we construct a cross-scope spectral self-attention to effectively capture the intrinsic correlations among global spectral bands. Finally, we elaborate a concise feed-forward neural network to enhance the feature representation capacity in the Transformer structure. Extensive experiments over three hyperspectral datasets demonstrate that the proposed CST is superior to other state-of-the-art methods both quantitatively and visually. The code is available at \url{https://github.com/Tomchenshi/CST.git}.

Via

Access Paper or Ask Questions

Learning transformer-based heterogeneously salient graph representation for multimodal fusion classification of hyperspectral image and LiDAR data

Nov 17, 2023

Jiaqi Yang, Bo Du, Liangpei Zhang

Abstract:Data collected by different modalities can provide a wealth of complementary information, such as hyperspectral image (HSI) to offer rich spectral-spatial properties, synthetic aperture radar (SAR) to provide structural information about the Earth's surface, and light detection and ranging (LiDAR) to cover altitude information about ground elevation. Therefore, a natural idea is to combine multimodal images for refined and accurate land-cover interpretation. Although many efforts have been attempted to achieve multi-source remote sensing image classification, there are still three issues as follows: 1) indiscriminate feature representation without sufficiently considering modal heterogeneity, 2) abundant features and complex computations associated with modeling long-range dependencies, and 3) overfitting phenomenon caused by sparsely labeled samples. To overcome the above barriers, a transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. Then, a self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling. Finally, a mean forward is put forward in order to avoid overfitting. Based on the above structures, the proposed model is able to break through modal gaps to obtain differentiated graph representation with competitive time cost, even for a small fraction of training samples. Experiments and analyses on three benchmark datasets with various state-of-the-art (SOTA) methods show the performance of the proposed approach.

Via

Access Paper or Ask Questions

EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Oct 30, 2023

Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu Jin, Liangpei Zhang

Figure 1 for EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Figure 2 for EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Figure 3 for EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Figure 4 for EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

Abstract:Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR

* Submitted to IEEE TGRS

Via

Access Paper or Ask Questions

A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

Oct 11, 2023

Jingtao Li, Xinyu Wang, Hengwei Zhao, Liangpei Zhang, Yanfei Zhong

Figure 1 for A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

Figure 2 for A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

Figure 3 for A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

Figure 4 for A Unified Remote Sensing Anomaly Detector Across Modalities and Scenes via Deviation Relationship Learning

Abstract:Remote sensing anomaly detector can find the objects deviating from the background as potential targets. Given the diversity in earth anomaly types, a unified anomaly detector across modalities and scenes should be cost-effective and flexible to new earth observation sources and anomaly types. However, the current anomaly detectors are limited to a single modality and single scene, since they aim to learn the varying background distribution. Motivated by the universal anomaly deviation pattern, in that anomalies exhibit deviations from their local context, we exploit this characteristic to build a unified anomaly detector. Firstly, we reformulate the anomaly detection task as an undirected bilayer graph based on the deviation relationship, where the anomaly score is modeled as the conditional probability, given the pattern of the background and normal objects. The learning objective is then expressed as a conditional probability ranking problem. Furthermore, we design an instantiation of the reformulation in the data, architecture, and optimization aspects. Simulated spectral and spatial anomalies drive the instantiated architecture. The model is optimized directly for the conditional probability ranking. The proposed model was validated in five modalities including the hyperspectral, visible light, synthetic aperture radar (SAR), infrared and low light to show its unified detection ability.

* Journal paper

Via

Access Paper or Ask Questions

Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Sep 29, 2023

Zhuo Zheng, Shiqi Tian, Ailong Ma, Liangpei Zhang, Yanfei Zhong

Figure 1 for Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Figure 2 for Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Figure 3 for Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Figure 4 for Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Abstract:Understanding the temporal dynamics of Earth's surface is a mission of multi-temporal remote sensing image analysis, significantly promoted by deep vision models with its fuel -- labeled multi-temporal images. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a scalable multi-temporal remote sensing change data generator via generative modeling, which is cheap and automatic, alleviating these problems. Our main idea is to simulate a stochastic change process over time. We consider the stochastic change process as a probabilistic semantic state transition, namely generative probabilistic change model (GPCM), which decouples the complex simulation problem into two more trackable sub-problems, \ie, change event simulation and semantic change synthesis. To solve these two problems, we present the change generator (Changen), a GAN-based GPCM, enabling controllable object change data generation, including customizable object property, and change event. The extensive experiments suggest that our Changen has superior generation capability, and the change detectors with Changen pre-training exhibit excellent transferability to real-world change datasets.

* ICCV 2023

Via

Access Paper or Ask Questions

SAAN: Similarity-aware attention flow network for change detection with VHR remote sensing images

Aug 28, 2023

Haonan Guo, Xin Su, Chen Wu, Bo Du, Liangpei Zhang

Figure 1 for SAAN: Similarity-aware attention flow network for change detection with VHR remote sensing images

Figure 2 for SAAN: Similarity-aware attention flow network for change detection with VHR remote sensing images

Figure 3 for SAAN: Similarity-aware attention flow network for change detection with VHR remote sensing images

Figure 4 for SAAN: Similarity-aware attention flow network for change detection with VHR remote sensing images

Abstract:Change detection (CD) is a fundamental and important task for monitoring the land surface dynamics in the earth observation field. Existing deep learning-based CD methods typically extract bi-temporal image features using a weight-sharing Siamese encoder network and identify change regions using a decoder network. These CD methods, however, still perform far from satisfactorily as we observe that 1) deep encoder layers focus on irrelevant background regions and 2) the models' confidence in the change regions is inconsistent at different decoder stages. The first problem is because deep encoder layers cannot effectively learn from imbalanced change categories using the sole output supervision, while the second problem is attributed to the lack of explicit semantic consistency preservation. To address these issues, we design a novel similarity-aware attention flow network (SAAN). SAAN incorporates a similarity-guided attention flow module with deeply supervised similarity optimization to achieve effective change detection. Specifically, we counter the first issue by explicitly guiding deep encoder layers to discover semantic relations from bi-temporal input images using deeply supervised similarity optimization. The extracted features are optimized to be semantically similar in the unchanged regions and dissimilar in the changing regions. The second drawback can be alleviated by the proposed similarity-guided attention flow module, which incorporates similarity-guided attention modules and attention flow mechanisms to guide the model to focus on discriminative channels and regions. We evaluated the effectiveness and generalization ability of the proposed method by conducting experiments on a wide range of CD tasks. The experimental results demonstrate that our method achieves excellent performance on several CD tasks, with discriminative features and semantic consistency preserved.

* 15 pages,13 figures

Via

Access Paper or Ask Questions

Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision

Jul 23, 2023

Haonan Guo, Bo Du, Chen Wu, Xin Su, Liangpei Zhang

Abstract:The efficacy of building footprint segmentation from remotely sensed images has been hindered by model transfer effectiveness. Many existing building segmentation methods were developed upon the encoder-decoder architecture of U-Net, in which the encoder is finetuned from the newly developed backbone networks that are pre-trained on ImageNet. However, the heavy computational burden of the existing decoder designs hampers the successful transfer of these modern encoder networks to remote sensing tasks. Even the widely-adopted deep supervision strategy fails to mitigate these challenges due to its invalid loss in hybrid regions where foreground and background pixels are intermixed. In this paper, we conduct a comprehensive evaluation of existing decoder network designs for building footprint segmentation and propose an efficient framework denoted as BFSeg to enhance learning efficiency and effectiveness. Specifically, a densely-connected coarse-to-fine feature fusion decoder network that facilitates easy and fast feature fusion across scales is proposed. Moreover, considering the invalidity of hybrid regions in the down-sampled ground truth during the deep supervision process, we present a lenient deep supervision and distillation strategy that enables the network to learn proper knowledge from deep supervision. Building upon these advancements, we have developed a new family of building segmentation networks, which consistently surpass prior works with outstanding performance and efficiency across a wide range of newly developed encoder networks. The code will be released on https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework.

* 13 pages,8 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems

Via

Access Paper or Ask Questions

Building-road Collaborative Extraction from Remotely Sensed Images via Cross-Interaction

Jul 23, 2023

Haonan Guo, Xin Su, Chen Wu, Bo Du, Liangpei Zhang

Abstract:Buildings are the basic carrier of social production and human life; roads are the links that interconnect social networks. Building and road information has important application value in the frontier fields of regional coordinated development, disaster prevention, auto-driving, etc. Mapping buildings and roads from very high-resolution (VHR) remote sensing images have become a hot research topic. However, the existing methods often ignore the strong spatial correlation between roads and buildings and extract them in isolation. To fully utilize the complementary advantages between buildings and roads, we propose a building-road collaborative extraction method based on multi-task and cross-scale feature interaction to improve the accuracy of both tasks in a complementary way. A multi-task interaction module is proposed to interact information across tasks and preserve the unique information of each task, which tackle the seesaw phenomenon in multitask learning. By considering the variation in appearance and structure between buildings and roads, a cross-scale interaction module is designed to automatically learn the optimal reception field for different tasks. Compared with many existing methods that train each task individually, the proposed collaborative extraction method can utilize the complementary advantages between buildings and roads by the proposed inter-task and inter-scale feature interactions, and automatically select the optimal reception field for different tasks. Experiments on a wide range of urban and rural scenarios show that the proposed algorithm can achieve building-road extraction with outstanding performance and efficiency.

* 34 pages,9 figures, submitted to ISPRS Journal of Photogrammetry and Remote Sensing

Via

Access Paper or Ask Questions