Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Salient Object Detection: A Benchmark

Feb 27, 2018
Ali Borji, Ming-Ming Cheng, Huaizu Jiang, Jia Li

We extensively compare, qualitatively and quantitatively, 40 state-of-the-art models (28 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over 6 challenging datasets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted just two years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for state-of-the-art models, provide useful hints towards constructing more challenging large scale datasets and better saliency models. Finally, we propose probable solutions for tackling several open problems such as evaluation scores and dataset bias, which also suggest future research directions in the rapidly-growing field of salient object detection.

* Image Processing, IEEE Transactions on (Volume:24, Issue: 12), 2015 

CubeSLAM: Monocular 3D Object Detection and SLAM without Prior Models

Jun 01, 2018
Shichao Yang, Sebastian Scherer

We present a method for single image 3D cuboid object detection and multi-view object SLAM without prior object model, and demonstrate that the two aspects can benefit each other. For 3D detection, we generate high quality cuboid proposals from 2D bounding boxes and vanishing points sampling. The proposals are further scored and selected to align with image edges. Experiments on SUN RGBD and KITTI shows the efficiency and accuracy over existing approaches. Then in the second part, multi-view bundle adjustment with novel measurement functions is proposed to jointly optimize camera poses, objects and points, utilizing single view detection results. Objects can provide more geometric constraints and scale consistency compared to points. On the collected and public TUM and KITTI odometry datasets, we achieve better pose estimation accuracy over the state-of-the-art monocular SLAM while also improve the 3D object detection accuracy at the same time.


Category-Aware Transformer Network for Better Human-Object Interaction Detection

Apr 11, 2022
Leizhen Dong, Zhimin Li, Kunlun Xu, Zhijun Zhang, Luxin Yan, Sheng Zhong, Xu Zou

Human-Object Interactions (HOI) detection, which aims to localize a human and a relevant object while recognizing their interaction, is crucial for understanding a still image. Recently, transformer-based models have significantly advanced the progress of HOI detection. However, the capability of these models has not been fully explored since the Object Query of the model is always simply initialized as just zeros, which would affect the performance. In this paper, we try to study the issue of promoting transformer-based HOI detectors by initializing the Object Query with category-aware semantic information. To this end, we innovatively propose the Category-Aware Transformer Network (CATN). Specifically, the Object Query would be initialized via category priors represented by an external object detection model to yield better performance. Moreover, such category priors can be further used for enhancing the representation ability of features via the attention mechanism. We have firstly verified our idea via the Oracle experiment by initializing the Object Query with the groundtruth category information. And then extensive experiments have been conducted to show that a HOI detection model equipped with our idea outperforms the baseline by a large margin to achieve a new state-of-the-art result.


AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

Aug 25, 2021
Zongdai Liu, Dingfu Zhou, Feixiang Lu, Jin Fang, Liangjun Zhang

Existing deep learning-based approaches for monocular 3D object detection in autonomous driving often model the object as a rotated 3D cuboid while the object's geometric shape has been ignored. In this work, we propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework. Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain and regress their corresponding 3D coordinates in the local 3D object coordinate first. Then the 2D/3D geometric constraints are built by these correspondences for each object to boost the detection performance. For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed by fitting the deformed 3D object model and the object mask in the 2D image. The proposed framework has been verified on the public KITTI dataset and the experimental results demonstrate that by using additional geometrical constraints the detection performance has been significantly improved as compared to the baseline method. More importantly, the proposed framework achieves state-of-the-art performance with real time. Data and code will be available at


Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds

May 06, 2020
Li Wang, Dawei Zhao, Tao Wu, Hao Fu, Zhiyu Wang, Liang Xiao, Xin Xu, Bin Dai

3D moving object detection is one of the most critical tasks in dynamic scene analysis. In this paper, we propose a novel Drosophila-inspired 3D moving object detection method using Lidar sensors. According to the theory of elementary motion detector, we have developed a motion detector based on the shallow visual neural pathway of Drosophila. This detector is sensitive to the movement of objects and can well suppress background noise. Designing neural circuits with different connection modes, the approach searches for motion areas in a coarse-to-fine fashion and extracts point clouds of each motion area to form moving object proposals. An improved 3D object detection network is then used to estimate the point clouds of each proposal and efficiently generates the 3D bounding boxes and the object categories. We evaluate the proposed approach on the widely-used KITTI benchmark, and state-of-the-art performance was obtained by using the proposed approach on the task of motion detection.


Min-Entropy Latent Model for Weakly Supervised Object Detection

Feb 16, 2019
Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye

Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches.

* Accepted by TPAMI 

Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection

Mar 22, 2018
Xiongwei Wu, Daoxin Zhang, Jianke Zhu, Steven C. H. Hoi

Recent years have witnessed many exciting achievements for object detection using deep learning techniques. Despite achieving significant progresses, most existing detectors are designed to detect objects with relatively low-quality prediction of locations, i.e., often trained with the threshold of Intersection over Union (IoU) set to 0.5 by default, which can yield low-quality or even noisy detections. It remains an open challenge for how to devise and train a high-quality detector that can achieve more precise localization (i.e., IoU$>$0.5) without sacrificing the detection performance. In this paper, we propose a novel single-shot detection framework of Bidirectional Pyramid Networks (BPN) towards high-quality object detection, which consists of two novel components: (i) a Bidirectional Feature Pyramid structure for more effective and robust feature representations; and (ii) a Cascade Anchor Refinement to gradually refine the quality of predesigned anchors for more effective training. Our experiments showed that the proposed BPN achieves the best performances among all the single-stage object detectors on both PASCAL VOC and MS COCO datasets, especially for high-quality detections.


Thermal Object Detection using Domain Adaptation through Style Consistency

Jun 01, 2020
Farzeen Munir, Shoaib Azam, Muhammad Aasim Rafique, Ahmad Muqeem Sheri, Moongu Jeon

A recent fatal accident of an autonomous vehicle opens a debate about the use of infrared technology in the sensor suite for autonomous driving to increase visibility for robust object detection. Thermal imaging has an advantage over lidar, radar, and camera because it can detect the heat difference emitted by objects in the infrared spectrum. In contrast, lidar and camera capture in the visible spectrum, and adverse weather conditions can impact their accuracy. The limitations of object detection in images from conventional imaging sensors can be catered to by thermal images. This paper presents a domain adaptation method for object detection in thermal images. We explore multiple ideas of domain adaption. First, a generative adversarial network is used to transfer the low-level features from the visible spectrum to the infrared spectrum domain through style consistency. Second, a cross-domain model with style consistency is used for object detection in the infrared spectrum by transferring the trained visible spectrum model. The proposed strategies are evaluated on publicly available thermal image datasets (FLIR ADAS and KAIST Multi-Spectral). We find that adapting the low-level features from the source domain to the target domain through domain adaptation increases in mean average precision by approximately 10%.


SWIPENET: Object detection in noisy underwater images

Oct 19, 2020
Long Chen, Feixiang Zhou, Shengke Wang, Junyu Dong, Ning Li, Haiping Ma, Xin Wang, Huiyu Zhou

In recent years, deep learning based object detection methods have achieved promising performance in controlled environments. However, these methods lack sufficient capabilities to handle underwater object detection due to these challenges: (1) images in the underwater datasets and real applications are blurry whilst accompanying severe noise that confuses the detectors and (2) objects in real applications are usually small. In this paper, we propose a novel Sample-WeIghted hyPEr Network (SWIPENET), and a robust training paradigm named Curriculum Multi-Class Adaboost (CMA), to address these two problems at the same time. Firstly, the backbone of SWIPENET produces multiple high resolution and semantic-rich Hyper Feature Maps, which significantly improve small object detection. Secondly, a novel sample-weighted detection loss function is designed for SWIPENET, which focuses on learning high weight samples and ignore learning low weight samples. Moreover, inspired by the human education process that drives the learning from easy to hard concepts, we here propose the CMA training paradigm that first trains a clean detector which is free from the influence of noisy data. Then, based on the clean detector, multiple detectors focusing on learning diverse noisy data are trained and incorporated into a unified deep ensemble of strong noise immunity. Experiments on two underwater robot picking contest datasets (URPC2017 and URPC2018) show that the proposed SWIPENET+CMA framework achieves better accuracy in object detection against several state-of-the-art approaches.

* arXiv admin note: text overlap with arXiv:2005.11552 

Deep Regionlets for Object Detection

Aug 23, 2018
Hongyu Xu, Xutao Lv, Xiaoyu Wang, Zhou Ren, Navaneeth Bodla, Rama Chellappa

In this paper, we propose a novel object detection framework named "Deep Regionlets" by establishing a bridge between deep neural networks and conventional detection schema for accurate generic object detection. Motivated by the abilities of regionlets for modeling object deformation and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select regions to learn the features from. The regionlet learning module focuses on local feature selection and transformation to alleviate local variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a "gating network" within the regionlet leaning module to enable soft regionlet selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We perform ablation studies and conduct extensive experiments on the PASCAL VOC and Microsoft COCO datasets. The proposed framework outperforms state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.

* Accepted to ECCV 2018