Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gui-Song Xia

DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers

Jun 01, 2023

Tamer Saleh, Xingxing Weng, Shimaa Holail, Chen Hao, Gui-Song Xia

Figure 1 for DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers

Figure 2 for DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers

Figure 3 for DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers

Figure 4 for DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers

Abstract:The detection of flooded areas using high-resolution synthetic aperture radar (SAR) imagery is a critical task with applications in crisis and disaster management, as well as environmental resource planning. However, the complex nature of SAR images presents a challenge that often leads to an overestimation of the flood extent. To address this issue, we propose a novel differential attention metric-based network (DAM-Net) in this study. The DAM-Net comprises two key components: a weight-sharing Siamese backbone to obtain multi-scale change features of multi-temporal images and tokens containing high-level semantic information of water-body changes, and a temporal differential fusion (TDF) module that integrates semantic tokens and change features to generate flood maps with reduced speckle noise. Specifically, the backbone is split into multiple stages. In each stage, we design three modules, namely, temporal-wise feature extraction (TWFE), cross-temporal change attention (CTCA), and temporal-aware change enhancement (TACE), to effectively extract the change features. In TACE of the last stage, we introduce a class token to record high-level semantic information of water-body changes via the attention mechanism. Another challenge faced by data-driven deep learning algorithms is the limited availability of flood detection datasets. To overcome this, we have created the S1GFloods open-source dataset, a global-scale high-resolution Sentinel-1 SAR image pairs dataset covering 46 global flood events between 2015 and 2022. The experiments on the S1GFloods dataset using the proposed DAM-Net showed top results compared to state-of-the-art methods in terms of overall accuracy, F1-score, and IoU, which reached 97.8%, 96.5%, and 93.2%, respectively. Our dataset and code will be available online at https://github.com/Tamer-Saleh/S1GFlood-Detection.

* 16 pages, 11 figures

Via

Access Paper or Ask Questions

HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation

May 22, 2023

Jian Ding, Nan Xue, Gui-Song Xia, Bernt Schiele, Dengxin Dai

Abstract:Current semantic segmentation models have achieved great success under the independent and identically distributed (i.i.d.) condition. However, in real-world applications, test data might come from a different domain than training data. Therefore, it is important to improve model robustness against domain differences. This work studies semantic segmentation under the domain generalization setting, where a model is trained only on the source domain and tested on the unseen target domain. Existing works show that Vision Transformers are more robust than CNNs and show that this is related to the visual grouping property of self-attention. In this work, we propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks. The masks at different scales aim to segment out both parts and a whole of classes. HGFormer combines mask classification results at both scales for class label prediction. We assemble multiple interesting cross-domain settings by using seven public semantic segmentation datasets. Experiments show that HGFormer yields more robust semantic segmentation results than per-pixel classification methods and flat grouping transformers, and outperforms previous methods significantly. Code will be available at https://github.com/dingjiansw101/HGFormer.

* Accepted by CVPR 2023

Via

Access Paper or Ask Questions

FreePoint: Unsupervised Point Cloud Instance Segmentation

May 11, 2023

Zhikai Zhang, Jian Ding, Li Jiang, Dengxin Dai, Gui-Song Xia

Figure 1 for FreePoint: Unsupervised Point Cloud Instance Segmentation

Figure 2 for FreePoint: Unsupervised Point Cloud Instance Segmentation

Figure 3 for FreePoint: Unsupervised Point Cloud Instance Segmentation

Figure 4 for FreePoint: Unsupervised Point Cloud Instance Segmentation

Abstract:Instance segmentation of point clouds is a crucial task in 3D field with numerous applications that involve localizing and segmenting objects in a scene. However, achieving satisfactory results requires a large number of manual annotations, which is a time-consuming and expensive process. To alleviate dependency on annotations, we propose a method, called FreePoint, for underexplored unsupervised class-agnostic instance segmentation on point clouds. In detail, we represent the point features by combining coordinates, colors, normals, and self-supervised deep features. Based on the point features, we perform a multicut algorithm to segment point clouds into coarse instance masks as pseudo labels, which are used to train a point cloud instance segmentation model. To alleviate the inaccuracy of coarse masks during training, we propose a weakly-supervised training strategy and corresponding loss. Our work can also serve as an unsupervised pre-training pretext for supervised semantic instance segmentation with limited annotations. For class-agnostic instance segmentation on point clouds, FreePoint largely fills the gap with its fully-supervised counterpart based on the state-of-the-art instance segmentation model Mask3D and even surpasses some previous fully-supervised methods. When serving as a pretext task and fine-tuning on S3DIS, FreePoint outperforms training from scratch by 5.8% AP with only 10% mask annotations.

Via

Access Paper or Ask Questions

Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Apr 19, 2023

Yangguang Wang, Xiang Zhang, Mingyuan Lin, Lei Yu, Boxin Shi, Wen Yang, Gui-Song Xia

Figure 1 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Figure 2 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Figure 3 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Figure 4 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Abstract:Scene Dynamic Recovery (SDR) by inverting distorted Rolling Shutter (RS) images to an undistorted high frame-rate Global Shutter (GS) video is a severely ill-posed problem due to the missing temporal dynamic information in both RS intra-frame scanlines and inter-frame exposures, particularly when prior knowledge about camera/object motions is unavailable. Commonly used artificial assumptions on scenes/motions and data-specific characteristics are prone to producing sub-optimal solutions in real-world scenarios. To address this challenge, we propose an event-based SDR network within a self-supervised learning paradigm, i.e., SelfUnroll. We leverage the extremely high temporal resolution of event cameras to provide accurate inter/intra-frame dynamic information. Specifically, an Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals, including the temporal transition and spatial translation. Exploring connections in terms of RS-RS, RS-GS, and GS-RS, we explicitly formulate mutual constraints with the proposed E-IC, resulting in supervisions without ground-truth GS images. Extensive evaluations over synthetic and real datasets demonstrate that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios. The dataset and code are available at https://w3un.github.io/selfunroll/.

Via

Access Paper or Ask Questions

Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Apr 18, 2023

Chang Xu, Jian Ding, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, Gui-Song Xia

Figure 1 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Figure 2 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Figure 3 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Figure 4 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Abstract:Detecting arbitrarily oriented tiny objects poses intense challenges to existing detectors, especially for label assignment. Despite the exploration of adaptive label assignment in recent oriented object detectors, the extreme geometry shape and limited feature of oriented tiny objects still induce severe mismatch and imbalance issues. Specifically, the position prior, positive sample feature, and instance are mismatched, and the learning of extreme-shaped objects is biased and unbalanced due to little proper feature supervision. To tackle these issues, we propose a dynamic prior along with the coarse-to-fine assigner, dubbed DCFL. For one thing, we model the prior, label assignment, and object representation all in a dynamic manner to alleviate the mismatch issue. For another, we leverage the coarse prior matching and finer posterior constraint to dynamically assign labels, providing appropriate and relatively balanced supervision for diverse instances. Extensive experiments on six datasets show substantial improvements to the baseline. Notably, we obtain the state-of-the-art performance for one-stage detectors on the DOTA-v1.5, DOTA-v2.0, and DIOR-R datasets under single-scale training and testing. Codes are available at https://github.com/Chasel-Tsui/mmrotate-dcfl.

* Accepted by CVPR2023

Via

Access Paper or Ask Questions

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

Apr 05, 2023

Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia

Abstract:This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Via

Access Paper or Ask Questions

Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs

Mar 26, 2023

Ming Qian, Jincheng Xiong, Gui-Song Xia, Nan Xue

Figure 1 for Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs

Figure 2 for Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs

Figure 3 for Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs

Figure 4 for Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs

Abstract:This paper aims to develop an accurate 3D geometry representation of satellite images using satellite-ground image pairs. Our focus is on the challenging problem of generating ground-view panoramas from satellite images. We draw inspiration from the density field representation used in volumetric neural rendering and propose a new approach, called Sat2Density. Our method utilizes the properties of ground-view panoramas for the sky and non-sky regions to learn faithful density fields of 3D scenes in a geometric perspective. Unlike other methods that require extra 3D information during training, our Sat2Density can automatically learn the accurate and faithful 3D geometry via density representation from 2D-only supervision. This advancement significantly improves the ground-view panorama synthesis task. Additionally, our study provides a new geometric perspective to understand the relationship between satellite and ground-view images in 3D space.

* Project Page: https://sat2density.github.io

Via

Access Paper or Ask Questions

Learning to Super-Resolve Blurry Images with Events

Feb 27, 2023

Lei Yu, Bishan Wang, Xiang Zhang, Haijian Zhang, Wen Yang, Jianzhuang Liu, Gui-Song Xia

Figure 1 for Learning to Super-Resolve Blurry Images with Events

Figure 2 for Learning to Super-Resolve Blurry Images with Events

Figure 3 for Learning to Super-Resolve Blurry Images with Events

Figure 4 for Learning to Super-Resolve Blurry Images with Events

Abstract:Super-Resolution from a single motion Blurred image (SRB) is a severely ill-posed problem due to the joint degradation of motion blurs and low spatial resolution. In this paper, we employ events to alleviate the burden of SRB and propose an Event-enhanced SRB (E-SRB) algorithm, which can generate a sequence of sharp and clear images with High Resolution (HR) from a single blurry image with Low Resolution (LR). To achieve this end, we formulate an event-enhanced degeneration model to consider the low spatial resolution, motion blurs, and event noises simultaneously. We then build an event-enhanced Sparse Learning Network (eSL-Net++) upon a dual sparse learning scheme where both events and intensity frames are modeled with sparse representations. Furthermore, we propose an event shuffle-and-merge scheme to extend the single-frame SRB to the sequence-frame SRB without any additional training process. Experimental results on synthetic and real-world datasets show that the proposed eSL-Net++ outperforms state-of-the-art methods by a large margin. Datasets, codes, and more results are available at https://github.com/ShinyWang33/eSL-Net-Plusplus.

* Accepted by IEEE TPAMI

Via

Access Paper or Ask Questions

Few-Shot Object Detection via Variational Feature Aggregation

Jan 31, 2023

Jiaming Han, Yuqiang Ren, Jian Ding, Ke Yan, Gui-Song Xia

Figure 1 for Few-Shot Object Detection via Variational Feature Aggregation

Figure 2 for Few-Shot Object Detection via Variational Feature Aggregation

Figure 3 for Few-Shot Object Detection via Variational Feature Aggregation

Figure 4 for Few-Shot Object Detection via Variational Feature Aggregation

Abstract:As few-shot object detectors are often trained with abundant base samples and fine-tuned on few-shot novel examples,the learned models are usually biased to base classes and sensitive to the variance of novel examples. To address this issue, we propose a meta-learning framework with two novel feature aggregation schemes. More precisely, we first present a Class-Agnostic Aggregation (CAA) method, where the query and support features can be aggregated regardless of their categories. The interactions between different classes encourage class-agnostic representations and reduce confusion between base and novel classes. Based on the CAA, we then propose a Variational Feature Aggregation (VFA) method, which encodes support examples into class-level support features for robust feature aggregation. We use a variational autoencoder to estimate class distributions and sample variational features from distributions that are more robust to the variance of support examples. Besides, we decouple classification and regression tasks so that VFA is performed on the classification branch without affecting object localization. Extensive experiments on PASCAL VOC and COCO demonstrate that our method significantly outperforms a strong baseline (up to 16\%) and previous state-of-the-art methods (4\% in average). Code will be available at: \url{https://github.com/csuhan/VFA}

* Accepted by AAAI2023

Via

Access Paper or Ask Questions

Detecting Building Changes with Off-Nadir Aerial Images

Jan 26, 2023

Chao Pang, Jiang Wu, Jian Ding, Can Song, Gui-Song Xia

Abstract:The tilted viewing nature of the off-nadir aerial images brings severe challenges to the building change detection (BCD) problem: the mismatch of the nearby buildings and the semantic ambiguity of the building facades. To tackle these challenges, we present a multi-task guided change detection network model, named as MTGCD-Net. The proposed model approaches the specific BCD problem by designing three auxiliary tasks, including: (1) a pixel-wise classification task to predict the roofs and facades of buildings; (2) an auxiliary task for learning the roof-to-footprint offsets of each building to account for the misalignment between building roof instances; and (3) an auxiliary task for learning the identical roof matching flow between bi-temporal aerial images to tackle the building roof mismatch problem. These auxiliary tasks provide indispensable and complementary building parsing and matching information. The predictions of the auxiliary tasks are finally fused to the main building change detection branch with a multi-modal distillation module. To train and test models for the BCD problem with off-nadir aerial images, we create a new benchmark dataset, named BANDON. Extensive experiments demonstrate that our model achieves superior performance over the previous state-of-the-art competitors.

* SCIENCE CHINA Information Sciences (SCIS) 2023

Via

Access Paper or Ask Questions