Alert button
Picture for Yilei Shi

Yilei Shi

Alert button

HTC-DC Net: Monocular Height Estimation from Single Remote Sensing Images

Sep 28, 2023
Sining Chen, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu

3D geo-information is of great significance for understanding the living environment; however, 3D perception from remote sensing data, especially on a large scale, is restricted. To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data. As an ill-posed problem, monocular height estimation requires well-designed networks for enhanced representations to improve performance. Moreover, the distribution of height values is long-tailed with the low-height pixels, e.g., the background, as the head, and thus trained networks are usually biased and tend to underestimate building heights. To solve the problems, instead of formalizing the problem as a regression task, we propose HTC-DC Net following the classification-regression paradigm, with the head-tail cut (HTC) and the distribution-based constraints (DCs) as the main contributions. HTC-DC Net is composed of the backbone network as the feature extractor, the HTC-AdaBins module, and the hybrid regression process. The HTC-AdaBins module serves as the classification phase to determine bins adaptive to each input image. It is equipped with a vision transformer encoder to incorporate local context with holistic information and involves an HTC to address the long-tailed problem in monocular height estimation for balancing the performances of foreground and background pixels. The hybrid regression process does the regression via the smoothing of bins from the classification phase, which is trained via DCs. The proposed network is tested on three datasets of different resolutions, namely ISPRS Vaihingen (0.09 m), DFC19 (1.3 m) and GBH (3 m). Experimental results show the superiority of the proposed network over existing methods by large margins. Extensive ablation studies demonstrate the effectiveness of each design component.

* 18 pages, 10 figures, submitted to IEEE Transactions on Geoscience and Remote Sensing 
Viaarxiv icon

HyperLISTA-ABT: An Ultra-light Unfolded Network for Accurate Multi-component Differential Tomographic SAR Inversion

Sep 28, 2023
Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to three-dimensional (3D) reconstruction. The extension of deep learning-based algorithms to four-dimensional (4D) imaging, i.e., differential TomoSAR (D-TomoSAR) applications, is impeded mainly due to the high-dimensional weight matrices required by the network designed for D-TomoSAR inversion, which typically contain millions of freely trainable parameters. Learning such huge number of weights requires an enormous number of training samples, resulting in a large memory burden and excessive time consumption. To tackle this issue, we propose an efficient and accurate algorithm called HyperLISTA-ABT. The weights in HyperLISTA-ABT are determined in an analytical way according to a minimum coherence criterion, trimming the model down to an ultra-light one with only three hyperparameters. Additionally, HyperLISTA-ABT improves the global thresholding by utilizing an adaptive blockwise thresholding scheme, which applies block-coordinate techniques and conducts thresholding in local blocks, so that weak expressions and local features can be retained in the shrinkage step layer by layer. Simulations were performed and demonstrated the effectiveness of our approach, showing that HyperLISTA-ABT achieves superior computational efficiency and with no significant performance degradation compared to state-of-the-art methods. Real data experiments showed that a high-quality 4D point cloud could be reconstructed over a large area by the proposed HyperLISTA-ABT with affordable computational resources and in a fast time.

Viaarxiv icon

Self-supervised Domain-agnostic Domain Adaptation for Satellite Images

Sep 25, 2023
Fahong Zhang, Yilei Shi, Xiao Xiang Zhu

Figure 1 for Self-supervised Domain-agnostic Domain Adaptation for Satellite Images
Figure 2 for Self-supervised Domain-agnostic Domain Adaptation for Satellite Images
Figure 3 for Self-supervised Domain-agnostic Domain Adaptation for Satellite Images
Figure 4 for Self-supervised Domain-agnostic Domain Adaptation for Satellite Images

Domain shift caused by, e.g., different geographical regions or acquisition conditions is a common issue in machine learning for global scale satellite image processing. A promising method to address this problem is domain adaptation, where the training and the testing datasets are split into two or multiple domains according to their distributions, and an adaptation method is applied to improve the generalizability of the model on the testing dataset. However, defining the domain to which each satellite image belongs is not trivial, especially under large-scale multi-temporal and multi-sensory scenarios, where a single image mosaic could be generated from multiple data sources. In this paper, we propose an self-supervised domain-agnostic domain adaptation (SS(DA)2) method to perform domain adaptation without such a domain definition. To achieve this, we first design a contrastive generative adversarial loss to train a generative network to perform image-to-image translation between any two satellite image patches. Then, we improve the generalizability of the downstream models by augmenting the training data with different testing spectral characteristics. The experimental results on public benchmarks verify the effectiveness of SS(DA)2.

Viaarxiv icon

Few-shot Object Detection in Remote Sensing: Lifting the Curse of Incompletely Annotated Novel Objects

Sep 19, 2023
Fahong Zhang, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu

Object detection is an essential and fundamental task in computer vision and satellite image processing. Existing deep learning methods have achieved impressive performance thanks to the availability of large-scale annotated datasets. Yet, in real-world applications the availability of labels is limited. In this context, few-shot object detection (FSOD) has emerged as a promising direction, which aims at enabling the model to detect novel objects with only few of them annotated. However, many existing FSOD algorithms overlook a critical issue: when an input image contains multiple novel objects and only a subset of them are annotated, the unlabeled objects will be considered as background during training. This can cause confusions and severely impact the model's ability to recall novel objects. To address this issue, we propose a self-training-based FSOD (ST-FSOD) approach, which incorporates the self-training mechanism into the few-shot fine-tuning process. ST-FSOD aims to enable the discovery of novel objects that are not annotated, and take them into account during training. On the one hand, we devise a two-branch region proposal networks (RPN) to separate the proposal extraction of base and novel objects, On another hand, we incorporate the student-teacher mechanism into RPN and the region of interest (RoI) head to include those highly confident yet unlabeled targets as pseudo labels. Experimental results demonstrate that our proposed method outperforms the state-of-the-art in various FSOD settings by a large margin. The codes will be publicly available at https://github.com/zhu-xlab/ST-FSOD.

Viaarxiv icon

UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation

Aug 02, 2023
Qingsong Xu, Yilei Shi, Jianhua Guo, Chaojun Ouyang, Xiao Xiang Zhu

Figure 1 for UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Figure 2 for UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Figure 3 for UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Figure 4 for UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation

Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-temporal images. To this end, we propose a change detection with domain shift setting for remote sensing images. Furthermore, we present a novel unsupervised CD method using a light-weight transformer, called UCDFormer. Specifically, a transformer-driven image translation composed of a light-weight transformer and a domain-specific affinity weight is first proposed to mitigate domain shift between two images with real-time efficiency. After image translation, we can generate the difference map between the translated before-event image and the original after-event image. Then, a novel reliable pixel extraction module is proposed to select significantly changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy c-means clustering and adaptive threshold. Finally, a binary change map is obtained based on these selected pixel pairs and a binary classifier. Experimental results on different unsupervised CD tasks with seasonal and style changes demonstrate the effectiveness of the proposed UCDFormer. For example, compared with several other related methods, UCDFormer improves performance on the Kappa coefficient by more than 12\%. In addition, UCDFormer achieves excellent performance for earthquake-induced landslide detection when considering large-scale applications. The code is available at \url{https://github.com/zhu-xlab/UCDFormer}

* 16 pages, 7 figures, IEEE Transactions on Geoscience and Remote Sensing 
Viaarxiv icon

PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds

Jul 17, 2023
Zhaiyu Chen, Yilei Shi, Liangliang Nan, Zhitong Xiong, Xiao Xiang Zhu

Figure 1 for PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds
Figure 2 for PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds
Figure 3 for PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds
Figure 4 for PolyGNN: Polyhedron-based Graph Neural Network for 3D Building Reconstruction from Point Clouds

We present PolyGNN, a polyhedron-based graph neural network for 3D building reconstruction from point clouds. PolyGNN learns to assemble primitives obtained by polyhedral decomposition via graph node classification, achieving a watertight, compact, and weakly semantic reconstruction. To effectively represent arbitrary-shaped polyhedra in the neural network, we propose three different sampling strategies to select representative points as polyhedron-wise queries, enabling efficient occupancy inference. Furthermore, we incorporate the inter-polyhedron adjacency to enhance the classification of the graph nodes. We also observe that existing city-building models are abstractions of the underlying instances. To address this abstraction gap and provide a fair evaluation of the proposed method, we develop our method on a large-scale synthetic dataset covering 500k+ buildings with well-defined ground truths of polyhedral class labels. We further conduct a transferability analysis across cities and on real-world point clouds. Both qualitative and quantitative results demonstrate the effectiveness of our method, particularly its efficiency for large-scale reconstructions. The source code and data of our work are available at https://github.com/chenzhaiyu/polygnn.

Viaarxiv icon

DisasterNets: Embedding Machine Learning in Disaster Mapping

Jun 16, 2023
Qingsong Xu, Yilei Shi, Xiao Xiang Zhu

Figure 1 for DisasterNets: Embedding Machine Learning in Disaster Mapping
Figure 2 for DisasterNets: Embedding Machine Learning in Disaster Mapping
Figure 3 for DisasterNets: Embedding Machine Learning in Disaster Mapping
Figure 4 for DisasterNets: Embedding Machine Learning in Disaster Mapping

Disaster mapping is a critical task that often requires on-site experts and is time-consuming. To address this, a comprehensive framework is presented for fast and accurate recognition of disasters using machine learning, termed DisasterNets. It consists of two stages, space granulation and attribute granulation. The space granulation stage leverages supervised/semi-supervised learning, unsupervised change detection, and domain adaptation with/without source data techniques to handle different disaster mapping scenarios. Furthermore, the disaster database with the corresponding geographic information field properties is built by using the attribute granulation stage. The framework is applied to earthquake-triggered landslide mapping and large-scale flood mapping. The results demonstrate a competitive performance for high-precision, high-efficiency, and cross-scene recognition of disasters. To bridge the gap between disaster mapping and machine learning communities, we will provide an openly accessible tool based on DisasterNets. The framework and tool will be available at https://github.com/HydroPML/DisasterNets.

* 4 pages, IEEE IGARSS 2023 
Viaarxiv icon

Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography

May 23, 2023
Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu

Figure 1 for Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography
Figure 2 for Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography
Figure 3 for Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography
Figure 4 for Basis Pursuit Denoising via Recurrent Neural Network Applied to Super-resolving SAR Tomography

Finding sparse solutions of underdetermined linear systems commonly requires the solving of L1 regularized least squares minimization problem, which is also known as the basis pursuit denoising (BPDN). They are computationally expensive since they cannot be solved analytically. An emerging technique known as deep unrolling provided a good combination of the descriptive ability of neural networks, explainable, and computational efficiency for BPDN. Many unrolled neural networks for BPDN, e.g. learned iterative shrinkage thresholding algorithm and its variants, employ shrinkage functions to prune elements with small magnitude. Through experiments on synthetic aperture radar tomography (TomoSAR), we discover the shrinkage step leads to unavoidable information loss in the dynamics of networks and degrades the performance of the model. We propose a recurrent neural network (RNN) with novel sparse minimal gated units (SMGUs) to solve the information loss issue. The proposed RNN architecture with SMGUs benefits from incorporating historical information into optimization, and thus effectively preserves full information in the final output. Taking TomoSAR inversion as an example, extensive simulations demonstrated that the proposed RNN outperforms the state-of-the-art deep learning-based algorithm in terms of super-resolution power as well as generalization ability. It achieved a 10% to 20% higher double scatterers detection rate and is less sensitive to phase and amplitude ratio differences between scatterers. Test on real TerraSAR-X spotlight images also shows a high-quality 3-D reconstruction of the test site.

Viaarxiv icon

High Quality Large-Scale 3-D Urban Mapping with Multi-Master TomoSAR

May 08, 2023
Yilei Shi, Richard Bamler, Yuanyuan Wang, Xiao Xiang Zhu

Figure 1 for High Quality Large-Scale 3-D Urban Mapping with Multi-Master TomoSAR
Figure 2 for High Quality Large-Scale 3-D Urban Mapping with Multi-Master TomoSAR
Figure 3 for High Quality Large-Scale 3-D Urban Mapping with Multi-Master TomoSAR
Figure 4 for High Quality Large-Scale 3-D Urban Mapping with Multi-Master TomoSAR

Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use large-stack interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban mapping using TanDEM-X data where only a few acquisitions are available in average for each city. This work proposes a new SAR tomographic processing framework to work with those extremely small stacks, which integrates the non-local filtering into SAR tomography inversion. The applicability of the algorithm is demonstrated using a TanDEM-X multi-baseline stack with 5 bistatic interferograms over the whole city of Munich, Germany. Systematic comparison of our result with airborne LiDAR data shows that the relative height accuracy of two third buildings is within two meters, which outperforms the TanDEM-X raw DEM. The promising performance of the proposed algorithm paved the first step towards high quality large-scale 3-D urban mapping.

* 7 pages. arXiv admin note: substantial text overlap with arXiv:2003.07803 
Viaarxiv icon