Alert button

"Object Detection": models, code, and papers
Alert button

Learning Remote Sensing Object Detection with Single Point Supervision

May 23, 2023
Shitian He, Huanxin Zou, Yingqian Wang, Boyang Li, Xu Cao, Ning Jing

Figure 1 for Learning Remote Sensing Object Detection with Single Point Supervision
Figure 2 for Learning Remote Sensing Object Detection with Single Point Supervision
Figure 3 for Learning Remote Sensing Object Detection with Single Point Supervision
Figure 4 for Learning Remote Sensing Object Detection with Single Point Supervision

Pointly Supervised Object Detection (PSOD) has attracted considerable interests due to its lower labeling cost as compared to box-level supervised object detection. However, the complex scenes, densely packed and dynamic-scale objects in Remote Sensing (RS) images hinder the development of PSOD methods in RS field. In this paper, we make the first attempt to achieve RS object detection with single point supervision, and propose a PSOD framework tailored with RS images. Specifically, we design a point label upgrader (PLUG) to generate pseudo box labels from single point labels, and then use the pseudo boxes to supervise the optimization of existing detectors. Moreover, to handle the challenge of the densely packed objects in RS images, we propose a sparse feature guided semantic prediction module which can generate high-quality semantic maps by fully exploiting informative cues from sparse objects. Extensive ablation studies on the DOTA dataset have validated the effectiveness of our method. Our method can achieve significantly better performance as compared to state-of-the-art image-level and point-level supervised detection methods, and reduce the performance gap between PSOD and box-level supervised object detection. Code will be available at https://github.com/heshitian/PLUG.

* 13 pages, 11 figures 

Semi-Supervised and Long-Tailed Object Detection with CascadeMatch

May 24, 2023
Yuhang Zang, Kaiyang Zhou, Chen Huang, Chen Change Loy

This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive pseudo-label mining mechanism to automatically identify suitable values from data. To mitigate confirmation bias, where a model is negatively reinforced by incorrect pseudo-labels produced by itself, each detection head is trained by the ensemble pseudo-labels of all detection heads. Experiments on two long-tailed datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches -- across a wide range of detection architectures -- in handling long-tailed object detection. For instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can even handle the challenging sparsely annotated object detection problem.

* International Journal of Computer Vision (IJCV), 2023 

MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes

May 25, 2023
Xingyuan Li, Jinyuan Liu, Yixin Lei, Long Ma, Xin Fan, Risheng Liu

Figure 1 for MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes
Figure 2 for MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes
Figure 3 for MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes
Figure 4 for MonoTDP: Twin Depth Perception for Monocular 3D Object Detection in Adverse Scenes

3D object detection plays a crucial role in numerous intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes, such as dense fog, heavy rain, and low light conditions. Although existing efforts primarily focus on diversifying network architecture or training schemes, resulting in significant progress in 3D object detection, most of these learnable modules fail in adverse scenes, thereby hindering detection performance. To address this issue, this paper proposes a monocular 3D detection model designed to perceive twin depth in adverse scenes, termed MonoTDP, which effectively mitigates the degradation of detection performance in various harsh environments. Specifically, we first introduce an adaptive learning strategy to aid the model in handling uncontrollable weather conditions, significantly resisting degradation caused by various degrading factors. Then, to address the depth/content loss in adverse regions, we propose a novel twin depth perception module that simultaneously estimates scene and object depth, enabling the integration of scene-level features and object-level features. Additionally, we assemble a new adverse 3D object detection dataset encompassing a wide range of challenging scenes, including rainy, foggy, and low light weather conditions, with each type of scene containing 7,481 images. Experimental results demonstrate that our proposed method outperforms current state-of-the-art approaches by an average of 3.12% in terms of AP_R40 for car category across various adverse environments.

* 10 pages, 5 figures, 3 tables 

Linear Object Detection in Document Images using Multiple Object Tracking

May 26, 2023
Philippe Bernet, Joseph Chazalon, Edwin Carlinet, Alexandre Bourquelot, Elodie Puybareau

Figure 1 for Linear Object Detection in Document Images using Multiple Object Tracking
Figure 2 for Linear Object Detection in Document Images using Multiple Object Tracking
Figure 3 for Linear Object Detection in Document Images using Multiple Object Tracking
Figure 4 for Linear Object Detection in Document Images using Multiple Object Tracking

Linear objects convey substantial information about document structure, but are challenging to detect accurately because of degradation (curved, erased) or decoration (doubled, dashed). Many approaches can recover some vector representation, but only one closed-source technique introduced in 1994, based on Kalman filters (a particular case of Multiple Object Tracking algorithm), can perform a pixel-accurate instance segmentation of linear objects and enable to selectively remove them from the original image. We aim at re-popularizing this approach and propose: 1. a framework for accurate instance segmentation of linear objects in document images using Multiple Object Tracking (MOT); 2. document image datasets and metrics which enable both vector- and pixel-based evaluation of linear object detection; 3. performance measures of MOT approaches against modern segment detectors; 4. performance measures of various tracking strategies, exhibiting alternatives to the original Kalman filters approach; and 5. an open-source implementation of a detector which can discriminate instances of curved, erased, dashed, intersecting and/or overlapping linear objects.

* Accepted to ICDAR 2023 

YOLOv3 with Spatial Pyramid Pooling for Object Detection with Unmanned Aerial Vehicles

May 21, 2023
Wahyu Pebrianto, Panca Mudjirahardjo, Sholeh Hadi Pramono, Rahmadwati, Raden Arief Setyawan

Object detection with Unmanned Aerial Vehicles (UAVs) has attracted much attention in the research field of computer vision. However, not easy to accurately detect objects with data obtained from UAVs, which capture images from very high altitudes, making the image dominated by small object sizes, that difficult to detect. Motivated by that challenge, we aim to improve the performance of the one-stage detector YOLOv3 by adding a Spatial Pyramid Pooling (SPP) layer on the end of the backbone darknet-53 to obtain more efficient feature extraction process in object detection tasks with UAVs. We also conducted an evaluation study on different versions of YOLOv3 methods. Includes YOLOv3 with SPP, YOLOv3, and YOLOv3-tiny, which we analyzed with the VisDrone2019-Det dataset. Here we show that YOLOv3 with SPP can get results mAP 0.6% higher than YOLOv3 and 26.6% than YOLOv3-Tiny at 640x640 input scale and is even able to maintain accuracy at different input image scales than other versions of the YOLOv3 method. Those results prove that the addition of SPP layers to YOLOv3 can be an efficient solution for improving the performance of the object detection method with data obtained from UAVs.

Online Open-set Semi-supervised Object Detection via Semi-supervised Outlier Filtering

May 23, 2023
Zerun Wang, Ling Xiao, Liuyu Xiang, Zhaotian Weng, Toshihiko Yamasaki

Figure 1 for Online Open-set Semi-supervised Object Detection via Semi-supervised Outlier Filtering
Figure 2 for Online Open-set Semi-supervised Object Detection via Semi-supervised Outlier Filtering
Figure 3 for Online Open-set Semi-supervised Object Detection via Semi-supervised Outlier Filtering
Figure 4 for Online Open-set Semi-supervised Object Detection via Semi-supervised Outlier Filtering

Open-set semi-supervised object detection (OSSOD) methods aim to utilize practical unlabeled datasets with out-of-distribution (OOD) instances for object detection. The main challenge in OSSOD is distinguishing and filtering the OOD instances from the in-distribution (ID) instances during pseudo-labeling. The previous method uses an offline OOD detection network trained only with labeled data for solving this problem. However, the scarcity of available data limits the potential for improvement. Meanwhile, training separately leads to low efficiency. To alleviate the above issues, this paper proposes a novel end-to-end online framework that improves performance and efficiency by mining more valuable instances from unlabeled data. Specifically, we first propose a semi-supervised OOD detection strategy to mine valuable ID and OOD instances in unlabeled datasets for training. Then, we constitute an online end-to-end trainable OSSOD framework by integrating the OOD detection head into the object detector, making it jointly trainable with the original detection task. Our experimental results show that our method works well on several benchmarks, including the partially labeled COCO dataset with open-set classes and the fully labeled COCO dataset with the additional large-scale open-set unlabeled dataset, OpenImages. Compared with previous OSSOD methods, our approach achieves the best performance on COCO with OpenImages by +0.94 mAP, reaching 44.07 mAP.

Leveraging object detection for the identification of lung cancer

May 25, 2023
Karthick Prasad Gunasekaran

Figure 1 for Leveraging object detection for the identification of lung cancer
Figure 2 for Leveraging object detection for the identification of lung cancer
Figure 3 for Leveraging object detection for the identification of lung cancer
Figure 4 for Leveraging object detection for the identification of lung cancer

Lung cancer poses a significant global public health challenge, emphasizing the importance of early detection for improved patient outcomes. Recent advancements in deep learning algorithms have shown promising results in medical image analysis. This study aims to explore the application of object detection particularly YOLOv5, an advanced object identification system, in medical imaging for lung cancer identification. To train and evaluate the algorithm, a dataset comprising chest X-rays and corresponding annotations was obtained from Kaggle. The YOLOv5 model was employed to train an algorithm capable of detecting cancerous lung lesions. The training process involved optimizing hyperparameters and utilizing augmentation techniques to enhance the model's performance. The trained YOLOv5 model exhibited exceptional proficiency in identifying lung cancer lesions, displaying high accuracy and recall rates. It successfully pinpointed malignant areas in chest radiographs, as validated by a separate test set where it outperformed previous techniques. Additionally, the YOLOv5 model demonstrated computational efficiency, enabling real-time detection and making it suitable for integration into clinical procedures. This proposed approach holds promise in assisting radiologists in the early discovery and diagnosis of lung cancer, ultimately leading to prompt treatment and improved patient outcomes.

* International Advanced Research Journal in Science, Engineering and Technology International Advanced Research Journal in Science, Engineering and Technology, Vol. 7, Issue 5, May 2020  

Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection

May 14, 2023
Burhaneddin Yaman, Tanvir Mahmud, Chun-Hao Liu

Figure 1 for Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection
Figure 2 for Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection
Figure 3 for Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection
Figure 4 for Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection

We propose an embarrassingly simple method -- instance-aware repeat factor sampling (IRFS) to address the problem of imbalanced data in long-tailed object detection. Imbalanced datasets in real-world object detection often suffer from a large disparity in the number of instances for each class. To improve the generalization performance of object detection models on rare classes, various data sampling techniques have been proposed. Repeat factor sampling (RFS) has shown promise due to its simplicity and effectiveness. Despite its efficiency, RFS completely neglects the instance counts and solely relies on the image count during re-sampling process. However, instance count may immensely vary for different classes with similar image counts. Such variation highlights the importance of both image and instance for addressing the long-tail distributions. Thus, we propose IRFS which unifies instance and image counts for the re-sampling process to be aware of different perspectives of the imbalance in long-tailed datasets. Our method shows promising results on the challenging LVIS v1.0 benchmark dataset over various architectures and backbones, demonstrating their effectiveness in improving the performance of object detection models on rare classes with a relative $+50\%$ average precision (AP) improvement over counterpart RFS. IRFS can serve as a strong baseline and be easily incorporated into existing long-tailed frameworks.

RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection

May 15, 2023
Hakjin Lee, Minki Song, Jamyoung Koo, Junghoon Seo

Figure 1 for RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection
Figure 2 for RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection
Figure 3 for RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection
Figure 4 for RHINO: Rotated DETR with Dynamic Denoising via Hungarian Matching for Oriented Object Detection

With the publication of DINO, a variant of the Detection Transformer (DETR), Detection Transformers are breaking the record in the object detection benchmark with the merits of their end-to-end design and scalability. However, the extension of DETR to oriented object detection has not been thoroughly studied although more benefits from its end-to-end architecture are expected such as removing NMS and anchor-related costs. In this paper, we propose a first strong DINO-based baseline for oriented object detection. We found that straightforward employment of DETRs for oriented object detection does not guarantee non-duplicate prediction, and propose a simple cost to mitigate this. Furthermore, we introduce a $\textit{dynamic denoising}$ strategy that uses Hungarian matching to filter redundant noised queries and $\textit{query alignment}$ to preserve matching consistency between Transformer decoder layers. Our proposed model outperforms previous rotated DETRs and other counterparts, achieving state-of-the-art performance in DOTA-v1.0/v1.5/v2.0, and DIOR-R benchmarks.

* State-of-the-art Rotated Object Detector in DOTA v1.0/v1.5/v2.0 and DIOR-R 

Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models

May 24, 2023
Setareh Dabiri, Vasileios Lioutas, Berend Zwartsenberg, Yunpeng Liu, Matthew Niedoba, Xiaoxuan Liang, Dylan Green, Justice Sefas, Jonathan Wilder Lavington, Frank Wood, Adam Scibior

Figure 1 for Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Figure 2 for Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Figure 3 for Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Figure 4 for Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models

When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution.