Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Improved detection of small objects in road network sequences

May 18, 2021
Iván García, Rafael Marcos Luque, Ezequiel López

The vast number of existing IP cameras in current road networks is an opportunity to take advantage of the captured data and analyze the video and detect any significant events. For this purpose, it is necessary to detect moving vehicles, a task that was carried out using classical artificial vision techniques until a few years ago. Nowadays, significant improvements have been obtained by deep learning networks. Still, object detection is considered one of the leading open issues within computer vision. The current scenario is constantly evolving, and new models and techniques are appearing trying to improve this field. In particular, new problems and drawbacks appear regarding detecting small objects, which correspond mainly to the vehicles that appear in the road scenes. All this means that new solutions that try to improve the low detection rate of small elements are essential. Among the different emerging research lines, this work focuses on the detection of small objects. In particular, our proposal aims to vehicle detection from images captured by video surveillance cameras. In this work, we propose a new procedure for detecting small-scale objects by applying super-resolution processes based on detections performed by convolutional neural networks \emph{(CNN)}. The neural network is integrated with processes that are in charge of increasing the resolution of the images to improve the object detection performance. This solution has been tested for a set of traffic images containing elements of different scales to test the efficiency according to the detections obtained by the model, thus demonstrating that our proposal achieves good results in a wide range of situations.


Activity Driven Weakly Supervised Object Detection

Apr 02, 2019
Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan

Weakly supervised object detection aims at reducing the amount of supervision required to train detection models. Such models are traditionally learned from images/videos labelled only with the object class and not the object bounding box. In our work, we try to leverage not only the object class labels but also the action labels associated with the data. We show that the action depicted in the image/video can provide strong cues about the location of the associated object. We learn a spatial prior for the object dependent on the action (e.g. "ball" is closer to "leg of the person" in "kicking ball"), and incorporate this prior to simultaneously train a joint object detection and action classification model. We conducted experiments on both video datasets and image datasets to evaluate the performance of our weakly supervised object detection model. Our approach outperformed the current state-of-the-art (SOTA) method by more than 6% in mAP on the Charades video dataset.

* CVPR'19 camera ready 

Enhanced Object Detection via Fusion With Prior Beliefs from Image Classification

Oct 21, 2016
Yilun Cao, Hyungtae Lee, Heesung Kwon

In this paper, we introduce a novel fusion method that can enhance object detection performance by fusing decisions from two different types of computer vision tasks: object detection and image classification. In the proposed work, the class label of an image obtained from image classification is viewed as prior knowledge about existence or non-existence of certain objects. The prior knowledge is then fused with the decisions of object detection to improve detection accuracy by mitigating false positives of an object detector that are strongly contradicted with the prior knowledge. A recently introduced novel fusion approach called dynamic belief fusion (DBF) is used to fuse the detector output with the classification prior. Experimental results show that the detection performance of all the detection algorithms used in the proposed work is improved on benchmark datasets via the proposed fusion framework.


Track to Detect and Segment: An Online Multi-Object Tracker

Mar 16, 2021
Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan

Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking. In this paper, we present a new online joint detection and tracking model, TraDeS (TRAck to DEtect and Segment), exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features for improving current object detection and segmentation. Effectiveness and superiority of TraDeS are shown on 4 datasets, including MOT (2D tracking), nuScenes (3D tracking), MOTS and Youtube-VIS (instance segmentation tracking). Project page:

* Accepted to CVPR 2021 

The Effects of Super-Resolution on Object Detection Performance in Satellite Imagery

Dec 12, 2018
Jacob Shermeyer, Adam Van Etten

We explore the application of super-resolution techniques to satellite imagery, and the effects of these techniques on object detection algorithm performance. Specifically, we enhance satellite imagery beyond its native resolution, and test if we can identify various types of vehicles, planes, and boats with greater accuracy than native resolution. Using the Very Deep Super-Resolution (VDSR) framework and a custom Random Forest Super-Resolution (RFSR) framework we generate enhancement levels of 2x, 4x, and 8x over five distinct resolutions ranging from 30 cm to 4.8 meters. Using both native and super-resolved data, we then train several custom detection models using the SIMRDWN object detection framework. SIMRDWN combines a number of popular object detection algorithms (e.g. SSD, YOLO) into a unified framework that is designed to rapidly detect objects in large satellite images. This approach allows us to quantify the effects of super-resolution techniques on object detection performance across multiple classes and resolutions. We also quantify the performance of object detection as a function of native resolution and object pixel size. For our test set we note that performance degrades from mAP = 0.5 at 30 cm resolution, down to mAP = 0.12 at 4.8 m resolution. Super-resolving native 30 cm imagery to 15 cm yields the greatest benefit; a 16-20% improvement in mAP. Super-resolution is less beneficial at coarser resolutions, though still provides a 3-10% improvement.


Unsupervised Object Detection with LiDAR Clues

Nov 27, 2020
Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu

Despite the importance of unsupervised object detection, to the best of our knowledge, there is no previous work addressing this problem. One main issue, widely known to the community, is that object boundaries derived only from 2D image appearance are ambiguous and unreliable. To address this, we exploit LiDAR clues to aid unsupervised object detection. By exploiting the 3D scene structure, the issue of localization can be considerably mitigated. We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated. In this paper, we present the first practical method for unsupervised object detection with the aid of LiDAR clues. In our approach, candidate object segments based on 3D point clouds are firstly generated. Then, an iterative segment labeling process is conducted to assign segment labels and to train a segment labeling network, which is based on features from both 2D images and 3D point clouds. The labeling process is carefully designed so as to mitigate the issue of long-tailed and open-ended distribution. The final segment labels are set as pseudo annotations for object detection network training. Extensive experiments on the large-scale Waymo Open dataset suggest that the derived unsupervised object detection method achieves reasonable accuracy compared with that of strong supervision within the LiDAR visible range. Code shall be released.


Object Detection under Rainy Conditions for Autonomous Vehicles

Jul 10, 2020
Mazin Hnewa, Hayder Radha

Advanced automotive active-safety systems, in general, and autonomous vehicles, in particular, rely heavily on visual data to classify and localize objects such as pedestrians, traffic signs and lights, and other nearby cars, to assist the corresponding vehicles maneuver safely in their environments. However, the performance of object detection methods could degrade rather significantly under challenging weather scenarios including rainy conditions. Despite major advancements in the development of deraining approaches, the impact of rain on object detection has largely been understudied, especially in the context of autonomous driving. The main objective of this paper is to present a tutorial on state-of-the-art and emerging techniques that represent leading candidates for mitigating the influence of rainy conditions on an autonomous vehicle's ability to detect objects. Our goal includes surveying and analyzing the performance of object detection methods trained and tested using visual data captured under clear and rainy conditions. Moreover, we survey and evaluate the efficacy and limitations of leading deraining approaches, deep-learning based domain adaptation, and image translation frameworks that are being considered for addressing the problem of object detection under rainy conditions. Experimental results of a variety of the surveyed techniques are presented as part of this tutorial.

* Accepted in IEEE Signal Processing Magazine / Special Issue on Autonomous Driving 

TS4Net: Two-Stage Sample Selective Strategy for Rotating Object Detection

Aug 06, 2021
Kai Feng, Weixing Li, Jun Han, Feng Pan, Dongdong Zheng

Rotating object detection has wide applications in aerial photographs, remote sensing images, UAVs, etc. At present, most of the rotating object detection datasets focus on the field of remote sensing, and these images are usually shot in high-altitude scenes. However, image datasets captured at low-altitude areas also should be concerned, such as drone-based datasets. So we present a low-altitude dronebased dataset, named UAV-ROD, aiming to promote the research and development in rotating object detection and UAV applications. The UAV-ROD consists of 1577 images and 30,090 instances of car category annotated by oriented bounding boxes. In particular, The UAV-ROD can be utilized for the rotating object detection, vehicle orientation recognition and object counting tasks. Compared with horizontal object detection, the regression stage of the rotation detection is a tricky problem. In this paper, we propose a rotating object detector TS4Net, which contains anchor refinement module (ARM) and two-stage sample selective strategy (TS4). The ARM can convert preseted horizontal anchors into high-quality rotated anchors through twostage anchor refinement. The TS4 module utilizes different constrained sample selective strategies to allocate positive and negative samples, which is adaptive to the regression task in different stages. Benefiting from the ARM and TS4, the TS4Net can achieve superior performance for rotating object detection solely with one preseted horizontal anchor. Extensive experimental results on UAV-ROD dataset and three remote sensing datasets DOTA, HRSC2016 and UCAS-AOD demonstrate that our method achieves competitive performance against most state-of-the-art methods.

* 12 pages, 11 figures 

Learning to Detect Human-Object Interactions

Mar 01, 2018
Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, Jia Deng

We study the problem of detecting human-object interactions (HOI) in static images, defined as predicting a human and an object bounding box with an interaction class label that connects them. HOI detection is a fundamental problem in computer vision as it provides semantic information about the interactions among the detected objects. We introduce HICO-DET, a new large benchmark for HOI detection, by augmenting the current HICO classification benchmark with instance annotations. To solve the task, we propose Human-Object Region-based Convolutional Neural Networks (HO-RCNN). At the core of our HO-RCNN is the Interaction Pattern, a novel DNN input that characterizes the spatial relations between two bounding boxes. Experiments on HICO-DET demonstrate that our HO-RCNN, by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.

* Accepted in WACV 2018 

MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection

Jan 09, 2020
Abdullah Rashwan, Rishav Agarwal, Agastya Kalra, Pascal Poupart

We present MatrixNets (xNets), a new deep architecture for object detection. xNets map objects with similar sizes and aspect ratios into many specialized layers, allowing xNets to provide a scale and aspect ratio aware architecture. We leverage xNets to enhance single-stage object detection frameworks. First, we apply xNets on anchor-based object detection, for which we predict object centers and regress the top-left and bottom-right corners. Second, we use MatrixNets for corner-based object detection by predicting top-left and bottom-right corners. Each corner predicts the center location of the object. We also enhance corner-based detection by replacing the embedding layer with center regression. Our final architecture achieves mAP of 47.8 on MS COCO, which is higher than its CornerNet counterpart by +5.6 mAP while also closing the gap between single-stage and two-stage detectors. The code is available at

* This is the full paper for arXiv:1908.04646 with more applications, experiments, and ablation study