Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengke Yuan

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Nov 09, 2024

Kaixuan Lu, Ruiqian Zhang, Xiao Huang, Yuxing Xie, Xiaogang Ning, Hanchao Zhang, Mengke Yuan, Pan Zhang, Tao Wang, Tongkui Liao

Figure 1 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Figure 2 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Figure 3 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Figure 4 for Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Abstract:Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing images. However, most remote sensing images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic remote sensing images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the Pattern Integration and Enhancement Vision Transformer (PIEViT), a novel self-supervised learning framework designed specifically for remote sensing imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs the Geospatial Pattern Cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. The Feature Integration Projection (FIP) module further refines masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for remote sensing image interpretation tasks.

Via

Access Paper or Ask Questions

Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

May 15, 2024

Guozhang Liu, Ting Liu, Mengke Yuan, Tao Pang, Guangxing Yang, Hao Fu, Tao Wang, Tongkui Liao

Figure 1 for Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

Figure 2 for Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

Figure 3 for Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

Figure 4 for Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

Abstract:The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method through dynamic loss decay (DLD) mechanism, inspired by the two phase ``early-learning'' and ``memorization'' learning dynamics of deep neural networks on clean and noisy samples. To be specific, we first observe the end point of early learning phase termed as EL, after which the models begin to memorize the false labels that significantly degrade the detection accuracy. Secondly, under the guidance of the training indicator, the losses of each sample are ranked in descending order, and we adaptively decay the losses of the top K largest ones (bad samples) in the following epochs. Because these large losses are of high confidence to be calculated with wrong labels. Experimental results show that the method achieves excellent noise resistance performance tested on multiple public datasets such as HRSC2016 and DOTA-v1.0/v2.0 with synthetic category label noise. Our solution also has won the 2st place in the "fine-grained object detection based on sub-meter remote sensing imagery" track with noisy labels of 2023 National Big Data and Computing Intelligence Challenge.

Via

Access Paper or Ask Questions

HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Aug 10, 2023

Chaoran Lu, Ningning Cao, Pan Zhang, Ting Liu, Baochai Peng, Guozhang Liu, Mengke Yuan, Sen Zhang, Simin Huang, Tao Wang

Figure 1 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Figure 2 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Figure 3 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Figure 4 for HGDNet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation

Abstract:Unifying the correlative single-view satellite image building extraction and height estimation tasks indicates a promising way to share representations and acquire generalist model for large-scale urban 3D reconstruction. However, the common spatial misalignment between building footprints and stereo-reconstructed nDSM height labels incurs degraded performance on both tasks. To address this issue, we propose a Height-hierarchy Guided Dual-decoder Network (HGDNet) to estimate building height. Under the guidance of synthesized discrete height-hierarchy nDSM, auxiliary height-hierarchical building extraction branch enhance the height estimation branch with implicit constraints, yielding an accuracy improvement of more than 6% on the DFC 2023 track2 dataset. Additional two-stage cascade architecture is adopted to achieve more accurate building extraction. Experiments on the DFC 2023 Track 2 dataset shows the superiority of the proposed method in building height estimation ({\delta}1:0.8012), instance extraction (AP50:0.7730), and the final average score 0.7871 ranks in the first place in test phase.

Via

Access Paper or Ask Questions

Fine-grained building roof instance segmentation based on domain adapted pretraining and composite dual-backbone

Aug 10, 2023

Guozhang Liu, Baochai Peng, Ting Liu, Pan Zhang, Mengke Yuan, Chaoran Lu, Ningning Cao, Sen Zhang, Simin Huang, Tao Wang

Abstract:The diversity of building architecture styles of global cities situated on various landforms, the degraded optical imagery affected by clouds and shadows, and the significant inter-class imbalance of roof types pose challenges for designing a robust and accurate building roof instance segmentor. To address these issues, we propose an effective framework to fulfill semantic interpretation of individual buildings with high-resolution optical satellite imagery. Specifically, the leveraged domain adapted pretraining strategy and composite dual-backbone greatly facilitates the discriminative feature learning. Moreover, new data augmentation pipeline, stochastic weight averaging (SWA) training and instance segmentation based model ensemble in testing are utilized to acquire additional performance boost. Experiment results show that our approach ranks in the first place of the 2023 IEEE GRSS Data Fusion Contest (DFC) Track 1 test phase ($mAP_{50}$:50.6\%). Note-worthily, we have also explored the potential of multimodal data fusion with both optical satellite imagery and SAR data.

Via

Access Paper or Ask Questions

Hardware-Efficient Guided Image Filtering For Multi-Label Problem

Feb 28, 2018

Longquan Dai, Mengke Yuan, Zechao Li, Xiaopeng Zhang, Jinhui Tang

Figure 1 for Hardware-Efficient Guided Image Filtering For Multi-Label Problem

Figure 2 for Hardware-Efficient Guided Image Filtering For Multi-Label Problem

Figure 3 for Hardware-Efficient Guided Image Filtering For Multi-Label Problem

Figure 4 for Hardware-Efficient Guided Image Filtering For Multi-Label Problem

Abstract:The Guided Filter (GF) is well-known for its linear complexity. However, when filtering an image with an n-channel guidance, GF needs to invert an n x n matrix for each pixel. To the best of our knowledge existing matrix inverse algorithms are inefficient on current hardwares. This shortcoming limits applications of multichannel guidance in computation intensive system such as multi-label system. We need a new GF-like filter that can perform fast multichannel image guided filtering. Since the optimal linear complexity of GF cannot be minimized further, the only way thus is to bring all potentialities of current parallel computing hardwares into full play. In this paper we propose a hardware-efficient Guided Filter (HGF), which solves the efficiency problem of multichannel guided image filtering and yields competent results when applying it to multi-label problems with synthesized polynomial multichannel guidance. Specifically, in order to boost the filtering performance, HGF takes a new matrix inverse algorithm which only involves two hardware-efficient operations: element-wise arithmetic calculations and box filtering. In order to break the linear model restriction, HGF synthesizes a polynomial multichannel guidance to introduce nonlinearity. Benefiting from our polynomial guidance and hardware-efficient matrix inverse algorithm, HGF not only is more sensitive to the underlying structure of guidance but also achieves the fastest computing speed. Due to these merits, HGF obtains state-of-the-art results in terms of accuracy and efficiency in the computation intensive multi-label

Via

Access Paper or Ask Questions

Speeding Up the Bilateral Filter: A Joint Acceleration Way

Feb 28, 2018

Longquan Dai, Mengke Yuan, Xiaopeng Zhang

Figure 1 for Speeding Up the Bilateral Filter: A Joint Acceleration Way

Figure 2 for Speeding Up the Bilateral Filter: A Joint Acceleration Way

Figure 3 for Speeding Up the Bilateral Filter: A Joint Acceleration Way

Figure 4 for Speeding Up the Bilateral Filter: A Joint Acceleration Way

Abstract:Computational complexity of the brute-force implementation of the bilateral filter (BF) depends on its filter kernel size. To achieve the constant-time BF whose complexity is irrelevant to the kernel size, many techniques have been proposed, such as 2D box filtering, dimension promotion, and shiftability property. Although each of the above techniques suffers from accuracy and efficiency problems, previous algorithm designers were used to take only one of them to assemble fast implementations due to the hardness of combining them together. Hence, no joint exploitation of these techniques has been proposed to construct a new cutting edge implementation that solves these problems. Jointly employing five techniques: kernel truncation, best N -term approximation as well as previous 2D box filtering, dimension promotion, and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and, therefore, can draw upon one another's strong point to overcome deficiencies. The strength of our method has been corroborated by several carefully designed experiments. In particular, the filtering accuracy is significantly improved without sacrificing the efficiency at running time.

Via

Access Paper or Ask Questions