Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

ParkingSticker: A Real-World Object Detection Dataset

Feb 12, 2020
Caroline Potts, Ethem F. Can, Aysu Ezen-Can, Xiangqian Hu

We present a new and challenging object detection dataset, ParkingSticker, which mimics the type of data available in industry problems more closely than popular existing datasets like PASCAL VOC. ParkingSticker contains 1,871 images that come from a security camera's video footage. The objective is to identify parking stickers on cars approaching a gate that the security camera faces. Bounding boxes are drawn around parking stickers in the images. The parking stickers are much smaller on average than the objects in other popular object detection datasets; this makes ParkingSticker a challenging test for object detection methods. This dataset also very realistically represents the data available in many industry problems where a customer presents a few video frames and asks for a solution to a very difficult problem. Performance of various object detection pipelines using a YOLOv2 architecture are presented and indicate that identifying the parking stickers in ParkingSticker is challenging yet feasible. We believe that this dataset will challenge researchers to solve a real-world problem with real-world constraints such as non-ideal camera positioning and small object-size-to-image-size ratios.

* 8 pages, 8 figures; Updated authors 

Deep Learning for Generic Object Detection: A Survey

Sep 06, 2018
Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen

Generic object detection, aiming at locating object instances from a large number of predefined categories in natural images, is one of the most fundamental and challenging problems in computer vision. Deep learning techniques have emerged in recent years as powerful methods for learning feature representations directly from data, and have led to remarkable breakthroughs in the field of generic object detection. Given this time of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought by deep learning techniques. More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subproblems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance. We finish by identifying promising directions for future research.

* Submitted to IJCV, 30pages 

NAS-FCOS: Efficient Search for Object Detection Architectures

Oct 24, 2021
Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures. What is noteworthy is that as of now, object detection is less touched by NAS algorithms despite its significant importance in computer vision. To the best of our knowledge, most of the recent NAS studies on object detection tasks fail to satisfactorily strike a balance between performance and efficiency of the resulting models, let alone the excessive amount of computational resources cost by those algorithms. Here we propose an efficient method to obtain better object detectors by searching for the feature pyramid network (FPN) as well as the prediction head of a simple anchor-free object detector, namely, FCOS [36], using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms, and strategies for evaluating network quality, we are able to find top-performing detection architectures within 4 days using 8 V100 GPUs. The discovered architectures surpass state-of-the-art object detection models (such as Faster R-CNN, Retina-Net and, FCOS) by 1.0% to 5.4% points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS method for object detection. Code is available at

* 14 pages, 11 figures, journal extension of NAS-FCOS (CVPR 2020), accepted by IJCV. arXiv admin note: substantial text overlap with arXiv:1906.04423 

Using 3D Shadows to Detect Object Hiding Attacks on Autonomous Vehicle Perception

Apr 29, 2022
Zhongyuan Hau, Soteris Demetriou, Emil C. Lupu

Autonomous Vehicles (AVs) are mostly reliant on LiDAR sensors which enable spatial perception of their surroundings and help make driving decisions. Recent works demonstrated attacks that aim to hide objects from AV perception, which can result in severe consequences. 3D shadows, are regions void of measurements in 3D point clouds which arise from occlusions of objects in a scene. 3D shadows were proposed as a physical invariant valuable for detecting spoofed or fake objects. In this work, we leverage 3D shadows to locate obstacles that are hidden from object detectors. We achieve this by searching for void regions and locating the obstacles that cause these shadows. Our proposed methodology can be used to detect an object that has been hidden by an adversary as these objects, while hidden from 3D object detectors, still induce shadow artifacts in 3D point clouds, which we use for obstacle detection. We show that using 3D shadows for obstacle detection can achieve high accuracy in matching shadows to their object and provide precise prediction of an obstacle's distance from the ego-vehicle.

* To appear in the Proceedings of the 2022 IEEE Security and Privacy Workshop on the Internet of Safe Things (SafeThings 2022) 

FMODetect: Robust Detection and Trajectory Estimation of Fast Moving Objects

Dec 15, 2020
Denys Rozumnyi, Jiri Matas, Filip Sroubek, Marc Pollefeys, Martin R. Oswald

We propose the first learning-based approach for detection and trajectory estimation of fast moving objects. Such objects are highly blurred and move over large distances within one video frame. Fast moving objects are associated with a deblurring and matting problem, also called deblatting. Instead of solving the complex deblatting problem jointly, we split the problem into matting and deblurring and solve them separately. The proposed method first detects all fast moving objects as a truncated distance function to the trajectory. Subsequently, a matting and fitting network for each detected object estimates the object trajectory and its blurred appearance without background. For the sharp appearance estimation, we propose an energy minimization based deblurring. The state-of-the-art methods are outperformed in terms of trajectory estimation and sharp appearance reconstruction. Compared to other methods, such as deblatting, the inference is of several orders of magnitude faster and allows applications such as real-time fast moving object detection and retrieval in large video collections.


RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection

Jun 25, 2021
Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, Dragomir Anguelov

The detection of 3D objects from LiDAR data is a critical component in most autonomous driving systems. Safe, high speed driving needs larger detection ranges, which are enabled by new LiDARs. These larger detection ranges require more efficient and accurate detection models. Towards this goal, we propose Range Sparse Net (RSN), a simple, efficient, and accurate 3D object detector in order to tackle real time 3D object detection in this extended detection regime. RSN predicts foreground points from range images and applies sparse convolutions on the selected foreground points to detect objects. The lightweight 2D convolutions on dense range images results in significantly fewer selected foreground points, thus enabling the later sparse convolutions in RSN to efficiently operate. Combining features from the range image further enhance detection accuracy. RSN runs at more than 60 frames per second on a 150m x 150m detection region on Waymo Open Dataset (WOD) while being more accurate than previously published detectors. As of 11/2020, RSN is ranked first in the WOD leaderboard based on the APH/LEVEL 1 metrics for LiDAR-based pedestrian and vehicle detection, while being several times faster than alternatives.

* CVPR 2021 

Condensing Two-stage Detection with Automatic Object Key Part Discovery

Jun 23, 2020
Zhe Chen, Jing Zhang, Dacheng Tao

Modern two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy. To address this problem, we propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts. To this end, we first introduce an automatic object key part discovery task to make neural networks discover representative sub-parts in each foreground object. With these discovered key parts, we then decompose the object appearance modeling into a key part modeling process and a global modeling process for detection. Key part modeling encodes fine and detailed features from the discovered key parts, and global modeling encodes rough and holistic object characteristics. In practice, such decomposition allows us to significantly abridge model parameters without sacrificing much detection accuracy. Experiments on popular datasets illustrate that our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads, with the performance only deteriorating by 1.5% when waiving around 96% of the original model parameters. Codes will be shortly released to the public through GitHub.


CG-SSD: Corner Guided Single Stage 3D Object Detection from LiDAR Point Cloud

Mar 05, 2022
Ruiqi Ma, Chi Chen, Bisheng Yang, Deren Li, Haiping Wang, Yangzi Cong, Zongtian Hu

At present, the anchor-based or anchor-free models that use LiDAR point clouds for 3D object detection use the center assigner strategy to infer the 3D bounding boxes. However, in a real world scene, the LiDAR can only acquire a limited object surface point clouds, but the center point of the object does not exist. Obtaining the object by aggregating the incomplete surface point clouds will bring a loss of accuracy in direction and dimension estimation. To address this problem, we propose a corner-guided anchor-free single-stage 3D object detection model (CG-SSD ).Firstly, 3D sparse convolution backbone network composed of residual layers and sub-manifold sparse convolutional layers are used to construct bird's eye view (BEV) features for further deeper feature mining by a lite U-shaped network; Secondly, a novel corner-guided auxiliary module (CGAM) is proposed to incorporate corner supervision signals into the neural network. CGAM is explicitly designed and trained to detect partially visible and invisible corners to obtains a more accurate object feature representation, especially for small or partial occluded objects; Finally, the deep features from both the backbone networks and CGAM module are concatenated and fed into the head module to predict the classification and 3D bounding boxes of the objects in the scene. The experiments demonstrate CG-SSD achieves the state-of-art performance on the ONCE benchmark for supervised 3D object detection using single frame point cloud data, with 62.77%mAP. Additionally, the experiments on ONCE and Waymo Open Dataset show that CGAM can be extended to most anchor-based models which use the BEV feature to detect objects, as a plug-in and bring +1.17%-+14.27%AP improvement.

* 27 pages