Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques

Feb 10, 2022
Sreenivasa Hikkal Venugopala

Estimating and understanding the surroundings of the vehicle precisely forms the basic and crucial step for the autonomous vehicle. The perception system plays a significant role in providing an accurate interpretation of a vehicle's environment in real-time. Generally, the perception system involves various subsystems such as localization, obstacle (static and dynamic) detection, and avoidance, mapping systems, and others. For perceiving the environment, these vehicles will be equipped with various exteroceptive (both passive and active) sensors in particular cameras, Radars, LiDARs, and others. These systems are equipped with deep learning techniques that transform the huge amount of data from the sensors into semantic information on which the object detection and localization tasks are performed. For numerous driving tasks, to provide accurate results, the location and depth information of a particular object is necessary. 3D object detection methods, by utilizing the additional pose data from the sensors such as LiDARs, stereo cameras, provides information on the size and location of the object. Based on recent research, 3D object detection frameworks performing object detection and localization on LiDAR data and sensor fusion techniques show significant improvement in their performance. In this work, a comparative study of the effect of using LiDAR data for object detection frameworks and the performance improvement seen by using sensor fusion techniques are performed. Along with discussing various state-of-the-art methods in both the cases, performing experimental analysis, and providing future research directions.

* 2021 International Conference on Industrial Automation, Robotics and Control Engineering (IARCE 2021) 

NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection

Jan 18, 2020
Yazhao Li, Yanwei Pang, Jianbing Shen, Jiale Cao, Ling Shao

Due to the advantages of real-time detection and improved performance, single-shot detectors have gained great attention recently. To solve the complex scale variations, single-shot detectors make scale-aware predictions based on multiple pyramid layers. However, the features in the pyramid are not scale-aware enough, which limits the detection performance. Two common problems in single-shot detectors caused by object scale variations can be observed: (1) small objects are easily missed; (2) the salient part of a large object is sometimes detected as an object. With this observation, we propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features. In NET, a Neighbor Erasing Module (NEM) is designed to erase the salient features of large objects and emphasize the features of small objects in shallow layers. A Neighbor Transferring Module (NTM) is introduced to transfer the erased features and highlight large objects in deep layers. With this mechanism, a single-shot network called NETNet is constructed for scale-aware object detection. In addition, we propose to aggregate nearest neighboring pyramid features to enhance our NET. NETNet achieves 38.5% AP at a speed of 27 FPS and 32.0% AP at a speed of 55 FPS on MS COCO dataset. As a result, NETNet achieves a better trade-off for real-time and accurate object detection.

* 10 pages, 8 figures 

Nonmyopic View Planning for Active Object Detection

Sep 20, 2013
Nikolay Atanasov, Bharath Sankaran, Jerome Le Ny, George J. Pappas, Kostas Daniilidis

One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of views, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and real-world experiments with the PR2 robot. The results suggest that our approach outperforms the widely-used greedy view point selection and provides a significant improvement over static object detection.

* 12 pages (two-column); 7 figures; 2 tables; Manuscript submitted to the IEEE Transactions on Robotics (TRO) 

Multi-Class Detection and Segmentation of Objects in Depth

Jan 23, 2013
Cheng Zhang, Hedvig Kjellstrom

The quality of life of many people could be improved by autonomous humanoid robots in the home. To function in the human world, a humanoid household robot must be able to locate itself and perceive the environment like a human; scene perception, object detection and segmentation, and object spatial localization in 3D are fundamental capabilities for such humanoid robots. This paper presents a 3D multi-class object detection and segmentation method. The contributions are twofold. Firstly, we present a multi-class detection method, where a minimal joint codebook is learned in a principled manner. Secondly, we incorporate depth information using RGB-D imagery, which increases the robustness of the method and gives the 3D location of objects -- necessary since the robot reasons in 3D space. Experiments show that the multi-class extension improves the detection efficiency with respect to the number of classes and the depth extension improves the detection robustness and give sufficient natural 3D location of the objects.


Restoring Negative Information in Few-Shot Object Detection

Oct 26, 2020
Yukuan Yang, Fangyun Wei, Miaojing Shi, Guoqi Li

Few-shot learning has recently emerged as a new challenge in the deep learning field: unlike conventional methods that train the deep neural networks (DNNs) with a large number of labeled data, it asks for the generalization of DNNs on new classes with few annotated samples. Recent advances in few-shot learning mainly focus on image classification while in this paper we focus on object detection. The initial explorations in few-shot object detection tend to simulate a classification scenario by using the positive proposals in images with respect to certain object class while discarding the negative proposals of that class. Negatives, especially hard negatives, however, are essential to the embedding space learning in few-shot object detection. In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives. We build our work on a recent few-shot pipeline RepMet with several new modules to encode negative information for both training and testing. Extensive experiments on ImageNet-LOC and PASCAL VOC show our method substantially improves the state-of-the-art few-shot object detection solutions. Our code is available at

* To appear in NeurIPS 2020 

Range Adaptation for 3D Object Detection in LiDAR

Sep 26, 2019
Ze Wang, Sihao Ding, Ying Li, Minming Zhao, Sohini Roychowdhury, Andreas Wallin, Guillermo Sapiro, Qiang Qiu

LiDAR-based 3D object detection plays a crucial role in modern autonomous driving systems. LiDAR data often exhibit severe changes in properties across different observation ranges. In this paper, we explore cross-range adaptation for 3D object detection using LiDAR, i.e., far-range observations are adapted to near-range. This way, far-range detection is optimized for similar performance to near-range one. We adopt a bird-eyes view (BEV) detection framework to perform the proposed model adaptation. Our model adaptation consists of an adversarial global adaptation, and a fine-grained local adaptation. The proposed cross range adaptation framework is validated on three state-of-the-art LiDAR based object detection networks, and we consistently observe performance improvement on the far-range objects, without adding any auxiliary parameters to the model. To the best of our knowledge, this paper is the first attempt to study cross-range LiDAR adaptation for object detection in point clouds. To demonstrate the generality of the proposed adaptation framework, experiments on more challenging cross-device adaptation are further conducted, and a new LiDAR dataset with high-quality annotated point clouds is released to promote future research.


Contrastive Object Detection Using Knowledge Graph Embeddings

Dec 21, 2021
Christopher Lang, Alexander Braun, Abhinav Valada

Object recognition for the most part has been approached as a one-hot problem that treats classes to be discrete and unrelated. Each image region has to be assigned to one member of a set of objects, including a background class, disregarding any similarities in the object types. In this work, we compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs that are widely applied in open world object detection. Extensive experimental results on multiple knowledge-embeddings as well as distance metrics indicate that knowledge-based class representations result in more semantically grounded misclassifications while performing on par compared to one-hot methods on the challenging COCO and Cityscapes object detection benchmarks. We generalize our findings to multiple object detection architectures by proposing a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.


CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection

Nov 10, 2020
Ramin Nabati, Hairong Qi

The perception system in autonomous vehicles is responsible for detecting and tracking the surrounding objects. This is usually done by taking advantage of several sensing modalities to increase robustness and accuracy, which makes sensor fusion a crucial part of the perception system. In this paper, we focus on the problem of radar and camera sensor fusion and propose a middle-fusion approach to exploit both radar and camera data for 3D object detection. Our approach, called CenterFusion, first uses a center point detection network to detect objects by identifying their center points on the image. It then solves the key data association problem using a novel frustum-based method to associate the radar detections to their corresponding object's center point. The associated radar detections are used to generate radar-based feature maps to complement the image features, and regress to object properties such as depth, rotation and velocity. We evaluate CenterFusion on the challenging nuScenes dataset, where it improves the overall nuScenes Detection Score (NDS) of the state-of-the-art camera-based algorithm by more than 12%. We further show that CenterFusion significantly improves the velocity estimation accuracy without using any additional temporal information. The code is available at .

* WACV 2021