Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gui-Song Xia

Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

Oct 23, 2023

Ruixiang Zhang, Chang Xu, Fang Xu, Wen Yang, Guangjun He, Huai Yu, Gui-Song Xia

Figure 1 for Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

Figure 2 for Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

Figure 3 for Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

Figure 4 for Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

Abstract:This paper focuses on the scale imbalance problem of semi-supervised object detection(SSOD) in aerial images. Compared to natural images, objects in aerial images show smaller sizes and larger quantities per image, increasing the difficulty of manual annotation. Meanwhile, the advanced SSOD technique can train superior detectors by leveraging limited labeled data and massive unlabeled data, saving annotation costs. However, as an understudied task in aerial images, SSOD suffers from a drastic performance drop when facing a large proportion of small objects. By analyzing the predictions between small and large objects, we identify three imbalance issues caused by the scale bias, i.e., pseudo-label imbalance, label assignment imbalance, and negative learning imbalance. To tackle these issues, we propose a novel Scale-discriminative Semi-Supervised Object Detection (S^3OD) learning pipeline for aerial images. In our S^3OD, three key components, Size-aware Adaptive Thresholding (SAT), Size-rebalanced Label Assignment (SLA), and Teacher-guided Negative Learning (TNL), are proposed to warrant scale unbiased learning. Specifically, SAT adaptively selects appropriate thresholds to filter pseudo-labels for objects at different scales. SLA balances positive samples of objects at different scales through resampling and reweighting. TNL alleviates the imbalance in negative samples by leveraging information generated by a teacher model. Extensive experiments conducted on the DOTA-v1.5 benchmark demonstrate the superiority of our proposed methods over state-of-the-art competitors. Codes will be released soon.

Via

Access Paper or Ask Questions

CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Sep 29, 2023

Chi Zhang, Xiang Zhang, Mingyuan Lin, Cheng Li, Chu He, Wen Yang, Gui-Song Xia, Lei Yu

Figure 1 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 2 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 3 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 4 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Abstract:Even though the collaboration between traditional and neuromorphic event cameras brings prosperity to frame-event based vision applications, the performance is still confined by the resolution gap crossing two modalities in both spatial and temporal domains. This paper is devoted to bridging the gap by increasing the temporal resolution for images, i.e., motion deblurring, and the spatial resolution for events, i.e., event super-resolving, respectively. To this end, we introduce CrossZoom, a novel unified neural Network (CZ-Net) to jointly recover sharp latent sequences within the exposure period of a blurry input and the corresponding High-Resolution (HR) events. Specifically, we present a multi-scale blur-event fusion architecture that leverages the scale-variant properties and effectively fuses cross-modality information to achieve cross-enhancement. Attention-based adaptive enhancement and cross-interaction prediction modules are devised to alleviate the distortions inherent in Low-Resolution (LR) events and enhance the final results through the prior blur-event complementary information. Furthermore, we propose a new dataset containing HR sharp-blurry images and the corresponding HR-LR event streams to facilitate future research. Extensive qualitative and quantitative experiments on synthetic and real-world datasets demonstrate the effectiveness and robustness of the proposed method. Codes and datasets are released at https://bestrivenzc.github.io/CZ-Net/.

Via

Access Paper or Ask Questions

QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds

Sep 25, 2023

Ji Wu, Huai Yu, Wen Yang, Gui-Song Xia

Figure 1 for QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds

Figure 2 for QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds

Figure 3 for QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds

Figure 4 for QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds

Abstract:This paper presents a novel framework to learn a concise geometric primitive representation for 3D point clouds. Different from representing each type of primitive individually, we focus on the challenging problem of how to achieve a concise and uniform representation robustly. We employ quadrics to represent diverse primitives with only 10 parameters and propose the first end-to-end learning-based framework, namely QuadricsNet, to parse quadrics in point clouds. The relationships between quadrics mathematical formulation and geometric attributes, including the type, scale and pose, are insightfully integrated for effective supervision of QuaidricsNet. Besides, a novel pattern-comprehensive dataset with quadrics segments and objects is collected for training and evaluation. Experiments demonstrate the effectiveness of our concise representation and the robustness of QuadricsNet. Our code is available at \url{https://github.com/MichaelWu99-lab/QuadricsNet}

* Submitted to ICRA 2024. 7 pages

Via

Access Paper or Ask Questions

2D-3D Pose Tracking with Multi-View Constraints

Sep 20, 2023

Huai Yu, Kuangyi Chen, Wen Yang, Sebastian Scherer, Gui-Song Xia

Abstract:Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the multi-view constraints during the training and inference process. We evaluate our proposed framework on the KITTI and Argoverse datasets. Experimental results demonstrate its superior performance compared to existing frame-by-frame 2D-3D pose tracking methods and state-of-the-art vision-only pose tracking algorithms. More online pose tracking videos are available at \url{https://youtu.be/yfBRdg7gw5M}

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Patched Line Segment Learning for Vector Road Mapping

Sep 06, 2023

Jiakun Xu, Bowen Xu, Gui-Song Xia, Liang Dong, Nan Xue

Abstract:This paper presents a novel approach to computing vector road maps from satellite remotely sensed images, building upon a well-defined Patched Line Segment (PaLiS) representation for road graphs that holds geometric significance. Unlike prevailing methods that derive road vector representations from satellite images using binary masks or keypoints, our method employs line segments. These segments not only convey road locations but also capture their orientations, making them a robust choice for representation. More precisely, given an input image, we divide it into non-overlapping patches and predict a suitable line segment within each patch. This strategy enables us to capture spatial and structural cues from these patch-based line segments, simplifying the process of constructing the road network graph without the necessity of additional neural networks for connectivity. In our experiments, we demonstrate how an effective representation of a road graph significantly enhances the performance of vector road mapping on established benchmarks, without requiring extensive modifications to the neural network architecture. Furthermore, our method achieves state-of-the-art performance with just 6 GPU hours of training, leading to a substantial 32-fold reduction in training costs in terms of GPU hours.

Via

Access Paper or Ask Questions

On the Robustness of Object Detection Models in Aerial Images

Aug 29, 2023

Haodong He, Jian Ding, Gui-Song Xia

Figure 1 for On the Robustness of Object Detection Models in Aerial Images

Figure 2 for On the Robustness of Object Detection Models in Aerial Images

Figure 3 for On the Robustness of Object Detection Models in Aerial Images

Figure 4 for On the Robustness of Object Detection Models in Aerial Images

Abstract:The robustness of object detection models is a major concern when applied to real-world scenarios. However, the performance of most object detection models degrades when applied to images subjected to corruptions, since they are usually trained and evaluated on clean datasets. Enhancing the robustness of object detection models is of utmost importance, especially for those designed for aerial images, which feature complex backgrounds, substantial variations in scales and orientations of objects. This paper addresses the challenge of assessing the robustness of object detection models in aerial images, with a specific emphasis on scenarios where images are affected by clouds. In this study, we introduce two novel benchmarks based on DOTA-v1.0. The first benchmark encompasses 19 prevalent corruptions, while the second focuses on cloud-corrupted images-a phenomenon uncommon in natural pictures yet frequent in aerial photography. We systematically evaluate the robustness of mainstream object detection models and perform numerous ablation experiments. Through our investigations, we find that enhanced model architectures, larger networks, well-crafted modules, and judicious data augmentation strategies collectively enhance the robustness of aerial object detection models. The benchmarks we propose and our comprehensive experimental analyses can facilitate research on robust object detection in aerial images. Codes and datasets are available at: (https://github.com/hehaodong530/DOTA-C)

* 16 pages

Via

Access Paper or Ask Questions

Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Aug 11, 2023

Xiang Zhang, Lei Yu, Wen Yang, Jianzhuang Liu, Gui-Song Xia

Figure 1 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Figure 2 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Figure 3 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Figure 4 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Abstract:Event-based motion deblurring has shown promising results by exploiting low-latency events. However, current approaches are limited in their practical usage, as they assume the same spatial resolution of inputs and specific blurriness distributions. This work addresses these limitations and aims to generalize the performance of event-based deblurring in real-world scenarios. We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur. A two-stage self-supervised learning scheme is then developed to fit real-world data distribution. By utilizing the relativity of blurriness, our approach efficiently ensures the restored brightness and structure of latent images and further generalizes deblurring performance to handle varying spatial and temporal scales of motion blur in a self-distillation manner. Our method is extensively evaluated, demonstrating remarkable performance, and we also introduce a real-world dataset consisting of multi-scale blurry frames and events to facilitate research in event-based deblurring.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

Towards Generic and Controllable Attacks Against Object Detection

Jul 23, 2023

Guopeng Li, Yue Xu, Jian Ding, Gui-Song Xia

Abstract:Existing adversarial attacks against Object Detectors (ODs) suffer from two inherent limitations. Firstly, ODs have complicated meta-structure designs, hence most advanced attacks for ODs concentrate on attacking specific detector-intrinsic structures, which makes it hard for them to work on other detectors and motivates us to design a generic attack against ODs. Secondly, most works against ODs make Adversarial Examples (AEs) by generalizing image-level attacks from classification to detection, which brings redundant computations and perturbations in semantically meaningless areas (e.g., backgrounds) and leads to an emergency for seeking controllable attacks for ODs. To this end, we propose a generic white-box attack, LGP (local perturbations with adaptively global attacks), to blind mainstream object detectors with controllable perturbations. For a detector-agnostic attack, LGP tracks high-quality proposals and optimizes three heterogeneous losses simultaneously. In this way, we can fool the crucial components of ODs with a part of their outputs without the limitations of specific structures. Regarding controllability, we establish an object-wise constraint that exploits foreground-background separation adaptively to induce the attachment of perturbations to foregrounds. Experimentally, the proposed LGP successfully attacked sixteen state-of-the-art object detectors on MS-COCO and DOTA datasets, with promising imperceptibility and transferability obtained. Codes are publicly released in https://github.com/liguopeng0923/LGP.git

Via

Access Paper or Ask Questions

Volumetric Wireframe Parsing from Neural Attraction Fields

Jul 14, 2023

Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu

Figure 1 for Volumetric Wireframe Parsing from Neural Attraction Fields

Figure 2 for Volumetric Wireframe Parsing from Neural Attraction Fields

Figure 3 for Volumetric Wireframe Parsing from Neural Attraction Fields

Figure 4 for Volumetric Wireframe Parsing from Neural Attraction Fields

Abstract:The primal sketch is a fundamental representation in Marr's vision theory, which allows for parsimonious image-level processing from 2D to 2.5D perception. This paper takes a further step by computing 3D primal sketch of wireframes from a set of images with known camera poses, in which we take the 2D wireframes in multi-view images as the basis to compute 3D wireframes in a volumetric rendering formulation. In our method, we first propose a NEural Attraction (NEAT) Fields that parameterizes the 3D line segments with coordinate Multi-Layer Perceptrons (MLPs), enabling us to learn the 3D line segments from 2D observation without incurring any explicit feature correspondences across views. We then present a novel Global Junction Perceiving (GJP) module to perceive meaningful 3D junctions from the NEAT Fields of 3D line segments by optimizing a randomly initialized high-dimensional latent array and a lightweight decoding MLP. Benefitting from our explicit modeling of 3D junctions, we finally compute the primal sketch of 3D wireframes by attracting the queried 3D line segments to the 3D junctions, significantly simplifying the computation paradigm of 3D wireframe parsing. In experiments, we evaluate our approach on the DTU and BlendedMVS datasets with promising performance obtained. As far as we know, our method is the first approach to achieve high-fidelity 3D wireframe parsing without requiring explicit matching.

* Technical report; Video can be found at https://youtu.be/qtBQYbOpVpc

Via

Access Paper or Ask Questions

Depth and DOF Cues Make A Better Defocus Blur Detector

Jun 20, 2023

Yuxin Jin, Ming Qian, Jincheng Xiong, Nan Xue, Gui-Song Xia

Figure 1 for Depth and DOF Cues Make A Better Defocus Blur Detector

Figure 2 for Depth and DOF Cues Make A Better Defocus Blur Detector

Figure 3 for Depth and DOF Cues Make A Better Defocus Blur Detector

Figure 4 for Depth and DOF Cues Make A Better Defocus Blur Detector

Abstract:Defocus blur detection (DBD) separates in-focus and out-of-focus regions in an image. Previous approaches mistakenly mistook homogeneous areas in focus for defocus blur regions, likely due to not considering the internal factors that cause defocus blur. Inspired by the law of depth, depth of field (DOF), and defocus, we propose an approach called D-DFFNet, which incorporates depth and DOF cues in an implicit manner. This allows the model to understand the defocus phenomenon in a more natural way. Our method proposes a depth feature distillation strategy to obtain depth knowledge from a pre-trained monocular depth estimation model and uses a DOF-edge loss to understand the relationship between DOF and depth. Our approach outperforms state-of-the-art methods on public benchmarks and a newly collected large benchmark dataset, EBD. Source codes and EBD dataset are available at: https:github.com/yuxinjin-whu/D-DFFNet.

* Code: https://github.com/yuxinjin-whu/D-DFFNet

Via

Access Paper or Ask Questions