Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Concealed Object Detection

Feb 20, 2021
Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, Ling Shao

We present the first systematic study on concealed object detection (COD), which aims to identify objects that are "perfectly" embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification, etc. Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperforms 12 cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available on our project page:

* 17 pages, 27 figures, Code: 
Access Paper or Ask Questions

Class-Aware Robust Adversarial Training for Object Detection

Mar 31, 2021
Pin-Chun Chen, Bo-Han Kung, Jun-Cheng Chen

Object detection is an important computer vision task with plenty of real-world applications; therefore, how to enhance its robustness against adversarial attacks has emerged as a crucial issue. However, most of the previous defense methods focused on the classification task and had few analysis in the context of the object detection task. In this work, to address the issue, we present a novel class-aware robust adversarial training paradigm for the object detection task. For a given image, the proposed approach generates an universal adversarial perturbation to simultaneously attack all the occurred objects in the image through jointly maximizing the respective loss for each object. Meanwhile, instead of normalizing the total loss with the number of objects, the proposed approach decomposes the total loss into class-wise losses and normalizes each class loss using the number of objects for the class. The adversarial training based on the class weighted loss can not only balances the influence of each class but also effectively and evenly improves the adversarial robustness of trained models for all the object classes as compared with the previous defense methods. Furthermore, with the recent development of fast adversarial training, we provide a fast version of the proposed algorithm which can be trained faster than the traditional adversarial training while keeping comparable performance. With extensive experiments on the challenging PASCAL-VOC and MS-COCO datasets, the evaluation results demonstrate that the proposed defense methods can effectively enhance the robustness of the object detection models.

Access Paper or Ask Questions

Non-imaging real-time detection and tracking of fast-moving objects

Aug 13, 2021
Fengming Zhou, Xuelei Shi, Jie Chen, Tianhang Tang, Yiguang Liu

Real-time detection and tracking of fast-moving objects have achieved great success in various fields. However, many existing methods, especially low-cost ones, are difficult to achieve real-time and long-term object detection and tracking. Here, a non-imaging strategy is proposed, including two stages, to realize fast-moving object detection and tracking in real-time and for the long term: 1) a contour-moments-based method is proposed to optimize the Hadamard pattern sequence. And then reconstructing projection curves of the object based on single-pixel imaging technology. The projection curve, which including the object location information, is reconstructed directly with the measurements collected by a single-pixel detector; 2) The fastest changing position in the projection curve can be obtained by solving first-order gradients. A gradient differential is used in two first-order gradients to calculate a differential curve with the sudden change positions. Finally, we can obtain the boundary information of the fast-moving object. We experimentally demonstrate that our approach can achieve a temporal resolution of 105 frames per second at a 1.28% sampling rate by using a 22,000 Hz digital micro-mirror device. The detection and tracking algorithm of the proposed strategy is computationally efficient. Compared with the state-of-the-art methods, our approach can make the sampling rate lower. Additionally, the strategy acquires not more than 1MB of data for each frame, which is capable of fast-moving object real-time and long-term detection and tracking.

Access Paper or Ask Questions

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search

Jun 28, 2021
Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Bin Ren, Yanzhi Wang, Xue Lin

Object detection plays an important role in self-driving cars for security development. However, mobile systems on self-driving cars with limited computation resources lead to difficulties for object detection. To facilitate this, we propose a compiler-aware neural pruning search framework to achieve high-speed inference on autonomous vehicles for 2D and 3D object detection. The framework automatically searches the pruning scheme and rate for each layer to find a best-suited pruning for optimizing detection accuracy and speed performance under compiler optimization. Our experiments demonstrate that for the first time, the proposed method achieves (close-to) real-time, 55ms and 99ms inference times for YOLOv4 based 2D object detection and PointPillars based 3D detection, respectively, on an off-the-shelf mobile phone with minor (or no) accuracy loss.

* Presented on the HiPEAC 2021 workshop (cogarch 2021) 
Access Paper or Ask Questions

Semi-supervised Learning for Dense Object Detection in Retail Scenes

Jul 05, 2021
Jaydeep Chauhan, Srikrishna Varadarajan, Muktabh Mayank Srivastava

Retail scenes usually contain densely packed high number of objects in each image. Standard object detection techniques use fully supervised training methodology. This is highly costly as annotating a large dense retail object detection dataset involves an order of magnitude more effort compared to standard datasets. Hence, we propose semi-supervised learning to effectively use the large amount of unlabeled data available in the retail domain. We adapt a popular self supervised method called noisy student initially proposed for object classification to the task of dense object detection. We show that using unlabeled data with the noisy student training methodology, we can improve the state of the art on precise detection of objects in densely packed retail scenes. We also show that performance of the model increases as you increase the amount of unlabeled data.

Access Paper or Ask Questions

ODDObjects: A Framework for Multiclass Unsupervised Anomaly Detection on Masked Objects

Apr 26, 2021
Ricky Ma

This paper presents a novel framework for unsupervised anomaly detection on masked objects called ODDObjects, which stands for Out-of-Distribution Detection on Objects. ODDObjects is designed to detect anomalies of various categories using unsupervised autoencoders trained on COCO-style datasets. The method utilizes autoencoder-based image reconstruction, where high reconstruction error indicates the possibility of an anomaly. The framework extends previous work on anomaly detection with autoencoders, comparing state-of-the-art models trained on object recognition datasets. Various model architectures were compared, and experimental results show that memory-augmented deep convolutional autoencoders perform the best at detecting out-of-distribution objects.

* 11 pages, 15 Postscript figures 
Access Paper or Ask Questions

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

Apr 29, 2021
Jiachen Li, Bowen Cheng, Rogerio Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi

Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric. In this paper, we present \textbf{Pseudo-Intersection-over-Union~(Pseudo-IoU)}: a simple metric that brings more standardized and accurate assignment rule into anchor-free object detection frameworks without any additional computational cost or extra parameters for training and testing, making it possible to further improve anchor-free object detection by utilizing training samples of good quality under effective assignment rules that have been previously applied in anchor-based methods. By incorporating Pseudo-IoU metric into an end-to-end single-stage anchor-free object detection framework, we observe consistent improvements in their performance on general object detection benchmarks such as PASCAL VOC and MSCOCO. Our method (single-model and single-scale) also achieves comparable performance to other recent state-of-the-art anchor-free methods without bells and whistles. Our code is based on mmdetection toolbox and will be made publicly available at

* CVPR 2021 Workshop 
Access Paper or Ask Questions

Learning Data Augmentation Strategies for Object Detection

Jun 26, 2019
Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this work, we study the impact of data augmentation on object detection. We first demonstrate that data augmentation operations borrowed from image classification may be helpful for training detection models, but the improvement is limited. Thus, we investigate how learned, specialized data augmentation policies improve generalization performance for detection models. Importantly, these augmentation policies only affect training and leave a trained model unchanged during evaluation. Experiments on the COCO dataset indicate that an optimized data augmentation policy improves detection accuracy by more than +2.3 mAP, and allow a single inference model to achieve a state-of-the-art accuracy of 50.7 mAP. Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy. For example, the best augmentation policy identified with COCO improves a strong baseline on PASCAL-VOC by +2.7 mAP. Our results also reveal that a learned augmentation policy is superior to state-of-the-art architecture regularization methods for object detection, even when considering strong baselines. Code for training with the learned policy is available online at

Access Paper or Ask Questions

Relation Graph Network for 3D Object Detection in Point Clouds

Nov 30, 2019
Mingtao Feng, Syed Zulqarnain Gilani, Yaonan Wang, Liang Zhang, Ajmal Mian

Convolutional Neural Networks (CNNs) have emerged as a powerful strategy for most object detection tasks on 2D images. However, their power has not been fully realised for detecting 3D objects in point clouds directly without converting them to regular grids. Existing state-of-art 3D object detection methods aim to recognize 3D objects individually without exploiting their relationships during learning or inference. In this paper, we first propose a strategy that associates the predictions of direction vectors and pseudo geometric centers together leading to a win-win solution for 3D bounding box candidates regression. Secondly, we propose point attention pooling to extract uniform appearance features for each 3D object proposal, benefiting from the learned direction features, semantic features and spatial coordinates of the object points. Finally, the appearance features are used together with the position features to build 3D object-object relationship graphs for all proposals to model their co-existence. We explore the effect of relation graphs on proposals' appearance features enhancement under supervised and unsupervised settings. The proposed relation graph network consists of a 3D object proposal generation module and a 3D relation module, makes it an end-to-end trainable network for detecting 3D object in point clouds. Experiments on challenging benchmarks ( SunRGB-Dand ScanNet datasets ) of 3D point clouds show that our algorithm can perform better than the existing state-of-the-art methods.

* Manuscript 
Access Paper or Ask Questions

Few-Shot Learning for Video Object Detection in a Transfer-Learning Scheme

Mar 30, 2021
Zhongjie Yu, Gaoang Wang, Lin Chen, Sebastian Raschka, Jiebo Luo

Different from static images, videos contain additional temporal and spatial information for better object detection. However, it is costly to obtain a large number of videos with bounding box annotations that are required for supervised deep learning. Although humans can easily learn to recognize new objects by watching only a few video clips, deep learning usually suffers from overfitting. This leads to an important question: how to effectively learn a video object detector from only a few labeled video clips? In this paper, we study the new problem of few-shot learning for video object detection. We first define the few-shot setting and create a new benchmark dataset for few-shot video object detection derived from the widely used ImageNet VID dataset. We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects. By analyzing the results of two methods under this framework (Joint and Freeze) on our designed weak and strong base datasets, we reveal insufficiency and overfitting problems. A simple but effective method, called Thaw, is naturally developed to trade off the two problems and validate our analysis. Extensive experiments on our proposed benchmark datasets with different scenarios demonstrate the effectiveness of our novel analysis in this new few-shot video object detection problem.

Access Paper or Ask Questions