Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Unsupervised Object Detection with LiDAR Clues

Nov 27, 2020
Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu

Despite the importance of unsupervised object detection, to the best of our knowledge, there is no previous work addressing this problem. One main issue, widely known to the community, is that object boundaries derived only from 2D image appearance are ambiguous and unreliable. To address this, we exploit LiDAR clues to aid unsupervised object detection. By exploiting the 3D scene structure, the issue of localization can be considerably mitigated. We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated. In this paper, we present the first practical method for unsupervised object detection with the aid of LiDAR clues. In our approach, candidate object segments based on 3D point clouds are firstly generated. Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network, which is based on features from both 2D images and 3D point clouds. The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution. The final segment labels are set as pseudo annotations for object detection network training. Extensive experiments on the large-scale Waymo Open dataset suggest that the derived unsupervised object detection method achieves reasonable accuracy compared with that of strong supervision within the LiDAR visible range. Code shall be released.

Access Paper or Ask Questions

Object Detection under Rainy Conditions for Autonomous Vehicles

Jul 10, 2020
Mazin Hnewa, Hayder Radha

Advanced automotive active-safety systems, in general, and autonomous vehicles, in particular, rely heavily on visual data to classify and localize objects such as pedestrians, traffic signs and lights, and other nearby cars, to assist the corresponding vehicles maneuver safely in their environments. However, the performance of object detection methods could degrade rather significantly under challenging weather scenarios including rainy conditions. Despite major advancements in the development of deraining approaches, the impact of rain on object detection has largely been understudied, especially in the context of autonomous driving. The main objective of this paper is to present a tutorial on state-of-the-art and emerging techniques that represent leading candidates for mitigating the influence of rainy conditions on an autonomous vehicle's ability to detect objects. Our goal includes surveying and analyzing the performance of object detection methods trained and tested using visual data captured under clear and rainy conditions. Moreover, we survey and evaluate the efficacy and limitations of leading deraining approaches, deep-learning based domain adaptation, and image translation frameworks that are being considered for addressing the problem of object detection under rainy conditions. Experimental results of a variety of the surveyed techniques are presented as part of this tutorial.

* Accepted in IEEE Signal Processing Magazine / Special Issue on Autonomous Driving 
Access Paper or Ask Questions

TS4Net: Two-Stage Sample Selective Strategy for Rotating Object Detection

Aug 06, 2021
Kai Feng, Weixing Li, Jun Han, Feng Pan, Dongdong Zheng

Rotating object detection has wide applications in aerial photographs, remote sensing images, UAVs, etc. At present, most of the rotating object detection datasets focus on the field of remote sensing, and these images are usually shot in high-altitude scenes. However, image datasets captured at low-altitude areas also should be concerned, such as drone-based datasets. So we present a low-altitude dronebased dataset, named UAV-ROD, aiming to promote the research and development in rotating object detection and UAV applications. The UAV-ROD consists of 1577 images and 30,090 instances of car category annotated by oriented bounding boxes. In particular, The UAV-ROD can be utilized for the rotating object detection, vehicle orientation recognition and object counting tasks. Compared with horizontal object detection, the regression stage of the rotation detection is a tricky problem. In this paper, we propose a rotating object detector TS4Net, which contains anchor refinement module (ARM) and two-stage sample selective strategy (TS4). The ARM can convert preseted horizontal anchors into high-quality rotated anchors through twostage anchor refinement. The TS4 module utilizes different constrained sample selective strategies to allocate positive and negative samples, which is adaptive to the regression task in different stages. Benefiting from the ARM and TS4, the TS4Net can achieve superior performance for rotating object detection solely with one preseted horizontal anchor. Extensive experimental results on UAV-ROD dataset and three remote sensing datasets DOTA, HRSC2016 and UCAS-AOD demonstrate that our method achieves competitive performance against most state-of-the-art methods.

* 12 pages, 11 figures 
Access Paper or Ask Questions

Learning to Detect Human-Object Interactions

Mar 01, 2018
Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, Jia Deng

We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them. HOI detection is a fundamental problem in computer vision as it provides semantic information about the interactions among the detected objects. We introduce HICO-DET, a new large benchmark for HOI detection, by augmenting the current HICO classification benchmark with instance annotations. To solve the task, we propose Human-Object Region-based Convolutional Neural Networks (HO-RCNN). At the core of our HO-RCNN is the Interaction Pattern, a novel DNN input that characterizes the spatial relations between two bounding boxes. Experiments on HICO-DET demonstrate that our HO-RCNN, by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.

* Accepted in WACV 2018 
Access Paper or Ask Questions

MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection

Jan 09, 2020
Abdullah Rashwan, Rishav Agarwal, Agastya Kalra, Pascal Poupart

We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and regress the top-left and bottom-right corners. Second, we use MatrixNets for corner-based object detection by predicting top-left and bottom-right corners. Each corner predicts the center location of the object. We also enhance corner-based detection by replacing the embedding layer with center regression. Our final architecture achieves mAP of 47.8 on MS COCO, which is higher than its CornerNet counterpart by +5.6 mAP while also closing the gap between single-stage and two-stage detectors. The code is available at

* This is the full paper for arXiv:1908.04646 with more applications, experiments, and ablation study 
Access Paper or Ask Questions

Exploiting Web Images for Weakly Supervised Object Detection

Jul 28, 2017
Qingyi Tao, Hao Yang, Jianfei Cai

In recent years, the performance of object detection has advanced significantly with the evolving deep convolutional neural networks. However, the state-of-the-art object detection methods still rely on accurate bounding box annotations that require extensive human labelling. Object detection without bounding box annotations, i.e, weakly supervised detection methods, are still lagging far behind. As weakly supervised detection only uses image level labels and does not require the ground truth of bounding box location and label of each object in an image, it is generally very difficult to distill knowledge of the actual appearances of objects. Inspired by curriculum learning, this paper proposes an easy-to-hard knowledge transfer scheme that incorporates easy web images to provide prior knowledge of object appearance as a good starting point. While exploiting large-scale free web imagery, we introduce a sophisticated labour free method to construct a web dataset with good diversity in object appearance. After that, semantic relevance and distribution relevance are introduced and utilized in the proposed curriculum training scheme. Our end-to-end learning with the constructed web data achieves remarkable improvement across most object classes especially for the classes that are often considered hard in other works.

Access Paper or Ask Questions

Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection

Dec 18, 2020
Di Feng, Zining Wang, Yiyang Zhou, Lars Rosenbaum, Fabian Timm, Klaus Dietmayer, Masayoshi Tomizuka, Wei Zhan

The availability of many real-world driving datasets is a key reason behind the recent progress of object detection algorithms in autonomous driving. However, there exist ambiguity or even failures in object labels due to error-prone annotation process or sensor observation noise. Current public object detection datasets only provide deterministic object labels without considering their inherent uncertainty, as does the common training process or evaluation metrics for object detectors. As a result, an in-depth evaluation among different object detection methods remains challenging, and the training process of object detectors is sub-optimal, especially in probabilistic object detection. In this work, we infer the uncertainty in bounding box labels from LiDAR point clouds based on a generative model, and define a new representation of the probabilistic bounding box through a spatial uncertainty distribution. Comprehensive experiments show that the proposed model reflects complex environmental noises in LiDAR perception and the label quality. Furthermore, we propose Jaccard IoU (JIoU) as a new evaluation metric that extends IoU by incorporating label uncertainty. We conduct an in-depth comparison among several LiDAR-based object detectors using the JIoU metric. Finally, we incorporate the proposed label uncertainty in a loss function to train a probabilistic object detector and to improve its detection accuracy. We verify our proposed methods on two public datasets (KITTI, Waymo), as well as on simulation data. Code is released at

Access Paper or Ask Questions

Detecting and Tracking Small Moving Objects in Wide Area Motion Imagery (WAMI) Using Convolutional Neural Networks (CNNs)

Nov 08, 2019
Yifan Zhou, Simon Maskell

This paper proposes an approach to detect moving objects in Wide Area Motion Imagery (WAMI), in which the objects are both small and well separated. Identifying the objects only using foreground appearance is difficult since a $100-$pixel vehicle is hard to distinguish from objects comprising the background. Our approach is based on background subtraction as an efficient and unsupervised method that is able to output the shape of objects. In order to reliably detect low contrast and small objects, we configure the background subtraction to extract foreground regions that might be objects of interest. While this dramatically increases the number of false alarms, a Convolutional Neural Network (CNN) considering both spatial and temporal information is then trained to reject the false alarms. In areas with heavy traffic, the background subtraction yields merged detections. To reduce the complexity of multi-target tracker needed, we train another CNN to predict the positions of multiple moving objects in an area. Our approach shows competitive detection performance on smaller objects relative to the state-of-the-art. We adopt a GM-PHD filter to associate detections over time and analyse the resulting performance.

* Accepted for publication in 22nd International Conference on Information Fusion (FUSION 2019) 
Access Paper or Ask Questions