Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Aug 25, 2021
Zongdai Liu, Dingfu Zhou, Feixiang Lu, Jin Fang, Liangjun Zhang

Existing deep learning-based approaches for monocular 3D object detection in autonomous driving often model the object as a rotated 3D cuboid while the object's geometric shape has been ignored. In this work, we propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework. Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain and regress their corresponding 3D coordinates in the local 3D object coordinate first. Then the 2D/3D geometric constraints are built by these correspondences for each object to boost the detection performance. For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed by fitting the deformed 3D object model and the object mask in the 2D image. The proposed framework has been verified on the public KITTI dataset and the experimental results demonstrate that by using additional geometrical constraints the detection performance has been significantly improved as compared to the baseline method. More importantly, the proposed framework achieves state-of-the-art performance with real time. Data and code will be available at

Access Paper or Ask Questions

Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds

May 06, 2020
Li Wang, Dawei Zhao, Tao Wu, Hao Fu, Zhiyu Wang, Liang Xiao, Xin Xu, Bin Dai

3D moving object detection is one of the most critical tasks in dynamic scene analysis. In this paper, we propose a novel Drosophila-inspired 3D moving object detection method using Lidar sensors. According to the theory of elementary motion detector, we have developed a motion detector based on the shallow visual neural pathway of Drosophila. This detector is sensitive to the movement of objects and can well suppress background noise. Designing neural circuits with different connection modes, the approach searches for motion areas in a coarse-to-fine fashion and extracts point clouds of each motion area to form moving object proposals. An improved 3D object detection network is then used to estimate the point clouds of each proposal and efficiently generates the 3D bounding boxes and the object categories. We evaluate the proposed approach on the widely-used KITTI benchmark, and state-of-the-art performance was obtained by using the proposed approach on the task of motion detection.

Access Paper or Ask Questions

Min-Entropy Latent Model for Weakly Supervised Object Detection

Feb 16, 2019
Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye

Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches.

* Accepted by TPAMI 
Access Paper or Ask Questions

Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection

Mar 22, 2018
Xiongwei Wu, Daoxin Zhang, Jianke Zhu, Steven C. H. Hoi

Recent years have witnessed many exciting achievements for object detection using deep learning techniques. Despite achieving significant progresses, most existing detectors are designed to detect objects with relatively low-quality prediction of locations, i.e., often trained with the threshold of Intersection over Union (IoU) set to 0.5 by default, which can yield low-quality or even noisy detections. It remains an open challenge for how to devise and train a high-quality detector that can achieve more precise localization (i.e., IoU$>$0.5) without sacrificing the detection performance. In this paper, we propose a novel single-shot detection framework of Bidirectional Pyramid Networks (BPN) towards high-quality object detection, which consists of two novel components: (i) a Bidirectional Feature Pyramid structure for more effective and robust feature representations; and (ii) a Cascade Anchor Refinement to gradually refine the quality of predesigned anchors for more effective training. Our experiments showed that the proposed BPN achieves the best performances among all the single-stage object detectors on both PASCAL VOC and MS COCO datasets, especially for high-quality detections.

Access Paper or Ask Questions

Thermal Object Detection using Domain Adaptation through Style Consistency

Jun 01, 2020
Farzeen Munir, Shoaib Azam, Muhammad Aasim Rafique, Ahmad Muqeem Sheri, Moongu Jeon

A recent fatal accident of an autonomous vehicle opens a debate about the use of infrared technology in the sensor suite for autonomous driving to increase visibility for robust object detection. Thermal imaging has an advantage over lidar, radar, and camera because it can detect the heat difference emitted by objects in the infrared spectrum. In contrast, lidar and camera capture in the visible spectrum, and adverse weather conditions can impact their accuracy. The limitations of object detection in images from conventional imaging sensors can be catered to by thermal images. This paper presents a domain adaptation method for object detection in thermal images. We explore multiple ideas of domain adaption. First, a generative adversarial network is used to transfer the low-level features from the visible spectrum to the infrared spectrum domain through style consistency. Second, a cross-domain model with style consistency is used for object detection in the infrared spectrum by transferring the trained visible spectrum model. The proposed strategies are evaluated on publicly available thermal image datasets (FLIR ADAS and KAIST Multi-Spectral). We find that adapting the low-level features from the source domain to the target domain through domain adaptation increases in mean average precision by approximately 10%.

Access Paper or Ask Questions

SWIPENET: Object detection in noisy underwater images

Oct 19, 2020
Long Chen, Feixiang Zhou, Shengke Wang, Junyu Dong, Ning Li, Haiping Ma, Xin Wang, Huiyu Zhou

In recent years, deep learning based object detection methods have achieved promising performance in controlled environments. However, these methods lack sufficient capabilities to handle underwater object detection due to these challenges: (1) images in the underwater datasets and real applications are blurry whilst accompanying severe noise that confuses the detectors and (2) objects in real applications are usually small. In this paper, we propose a novel Sample-WeIghted hyPEr Network (SWIPENET), and a robust training paradigm named Curriculum Multi-Class Adaboost (CMA), to address these two problems at the same time. Firstly, the backbone of SWIPENET produces multiple high resolution and semantic-rich Hyper Feature Maps, which significantly improve small object detection. Secondly, a novel sample-weighted detection loss function is designed for SWIPENET, which focuses on learning high weight samples and ignore learning low weight samples. Moreover, inspired by the human education process that drives the learning from easy to hard concepts, we here propose the CMA training paradigm that first trains a clean detector which is free from the influence of noisy data. Then, based on the clean detector, multiple detectors focusing on learning diverse noisy data are trained and incorporated into a unified deep ensemble of strong noise immunity. Experiments on two underwater robot picking contest datasets (URPC2017 and URPC2018) show that the proposed SWIPENET+CMA framework achieves better accuracy in object detection against several state-of-the-art approaches.

* arXiv admin note: text overlap with arXiv:2005.11552 
Access Paper or Ask Questions

Deep Regionlets for Object Detection

Aug 23, 2018
Hongyu Xu, Xutao Lv, Xiaoyu Wang, Zhou Ren, Navaneeth Bodla, Rama Chellappa

In this paper, we propose a novel object detection framework named "Deep Regionlets" by establishing a bridge between deep neural networks and conventional detection schema for accurate generic object detection. Motivated by the abilities of regionlets for modeling object deformation and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select regions to learn the features from. The regionlet learning module focuses on local feature selection and transformation to alleviate local variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a "gating network" within the regionlet leaning module to enable soft regionlet selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We perform ablation studies and conduct extensive experiments on the PASCAL VOC and Microsoft COCO datasets. The proposed framework outperforms state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.

* Accepted to ECCV 2018 
Access Paper or Ask Questions

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

Sep 11, 2014
Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In this paper, we propose multi-stage and deformable deep convolutional neural networks for object detection. This new deep learning object detection diagram has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. With the proposed multi-stage training strategy, multiple classifiers are jointly optimized to process samples at different difficulty levels. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of modeling averaging. The proposed approach ranked \#2 in ILSVRC 2014. It improves the mean averaged precision obtained by RCNN, which is the state-of-the-art of object detection, from $31\%$ to $45\%$. Detailed component-wise analysis is also provided through extensive experimental evaluation.

Access Paper or Ask Questions

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud

Mar 31, 2019
Xinshuo Weng, Kris Kitani

Monocular 3D scene understanding tasks, such as object size estimation, heading angle estimation and 3D localization, is challenging. Successful modern day methods for 3D scene understanding require the use of a 3D sensor such as a depth camera, a stereo camera or LiDAR. On the other hand, single image based methods have significantly worse performance, but rightly so, as there is little explicit depth information in a 2D image. In this work, we aim at bridging the performance gap between 3D sensing and 2D sensing for 3D object detection by enhancing LiDAR-based algorithms to work with single image input. Specifically, we perform monocular depth estimation and lift the input image to a point cloud representation, which we call pseudo-LiDAR point cloud. Then we can train a LiDAR-based 3D detection network with our pseudo-LiDAR end-to-end. Following the pipeline of two-stage 3D detection algorithms, we detect 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LiDAR for each proposal. Then an oriented 3D bounding box is detected for each frustum. To handle the large amount of noise in the pseudo-LiDAR, we propose two innovations: (1) use a 2D-3D bounding box consistency constraint, adjusting the predicted 3D bounding box to have a high overlap with its corresponding 2D proposal after projecting onto the image; (2) use the instance mask instead of the bounding box as the representation of 2D proposals, in order to reduce the number of points not belonging to the object in the point cloud frustum. Through our evaluation on the KITTI benchmark, we achieve the top-ranked performance on both bird's eye view and 3D object detection among all monocular methods, effectively quadrupling the performance over previous state-of-the-art.

Access Paper or Ask Questions

3D Object Detection and Tracking Based on Streaming Data

Sep 14, 2020
Xusen Guo, Jiangfeng Gu, Silu Guo, Zixiao Xu, Chengzhang Yang, Shanghua Liu, Long Cheng, Kai Huang

Recent approaches for 3D object detection have made tremendous progresses due to the development of deep learning. However, previous researches are mostly based on individual frames, leading to limited exploitation of information between frames. In this paper, we attempt to leverage the temporal information in streaming data and explore 3D streaming based object detection as well as tracking. Toward this goal, we set up a dual-way network for 3D object detection based on keyframes, and then propagate predictions to non-key frames through a motion based interpolation algorithm guided by temporal information. Our framework is not only shown to have significant improvements on object detection compared with frame-by-frame paradigm, but also proven to produce competitive results on KITTI Object Tracking Benchmark, with 76.68% in MOTA and 81.65% in MOTP respectively.

* Accepted by ICRA 2020 
Access Paper or Ask Questions