Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Leveraging Bottom-Up and Top-Down Attention for Few-Shot Object Detection

Jul 23, 2020
Xianyu Chen, Ming Jiang, Qi Zhao

Few-shot object detection aims at detecting objects with few annotated examples, which remains a challenging research problem yet to be explored. Recent studies have shown the effectiveness of self-learned top-down attention mechanisms in object detection and other vision tasks. The top-down attention, however, is less effective at improving the performance of few-shot detectors. Due to the insufficient training data, object detectors cannot effectively generate attention maps for few-shot examples. To improve the performance and interpretability of few-shot object detectors, we propose an attentive few-shot object detection network (AttFDNet) that takes the advantages of both top-down and bottom-up attention. Being task-agnostic, the bottom-up attention serves as a prior that helps detect and localize naturally salient objects. We further address specific challenges in few-shot object detection by introducing two novel loss terms and a hybrid few-shot learning strategy. Experimental results and visualization demonstrate the complementary nature of the two types of attention and their roles in few-shot object detection. Codes are available at

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Access Paper or Ask Questions

Any-Shot Object Detection

Mar 16, 2020
Shafin Rahman, Salman Khan, Nick Barnes, Fahad Shahbaz Khan

Previous work on novel object detection considers zero or few-shot settings where none or few examples of each category are available for training. In real world scenarios, it is less practical to expect that 'all' the novel classes are either unseen or {have} few-examples. Here, we propose a more realistic setting termed 'Any-shot detection', where totally unseen and few-shot categories can simultaneously co-occur during inference. Any-shot detection offers unique challenges compared to conventional novel object detection such as, a high imbalance between unseen, few-shot and seen object classes, susceptibility to forget base-training while learning novel classes and distinguishing novel classes from the background. To address these challenges, we propose a unified any-shot detection model, that can concurrently learn to detect both zero-shot and few-shot object classes. Our core idea is to use class semantics as prototypes for object detection, a formulation that naturally minimizes knowledge forgetting and mitigates the class-imbalance in the label space. Besides, we propose a rebalanced loss function that emphasizes difficult few-shot cases but avoids overfitting on the novel classes to allow detection of totally unseen classes. Without bells and whistles, our framework can also be used solely for Zero-shot detection and Few-shot detection tasks. We report extensive experiments on Pascal VOC and MS-COCO datasets where our approach is shown to provide significant improvements.

Access Paper or Ask Questions

Camouflaged Object Detection and Tracking: A Survey

Dec 25, 2020
Ajoy Mondal

Moving object detection and tracking have various applications, including surveillance, anomaly detection, vehicle navigation, etc. The literature on object detection and tracking is rich enough, and several essential survey papers exist. However, the research on camouflage object detection and tracking limited due to the complexity of the problem. Existing work on this problem has been done based on either biological characteristics of the camouflaged objects or computer vision techniques. In this article, we review the existing camouflaged object detection and tracking techniques using computer vision algorithms from the theoretical point of view. This article also addresses several issues of interest as well as future research direction on this area. We hope this review will help the reader to learn the recent advances in camouflaged object detection and tracking.

* International Journal of Image and Graphics, 2020 
Access Paper or Ask Questions

Active Terahertz Imaging Dataset for Concealed Object Detection

May 08, 2021
Dong Liang, Fei Xue, Ling Li

Concealed object detection in Terahertz imaging is an urgent need for public security and counter-terrorism. In this paper, we provide a public dataset for evaluating multi-object detection algorithms in active Terahertz imaging resolution 5 mm by 5 mm. To the best of our knowledge, this is the first public Terahertz imaging dataset prepared to evaluate object detection algorithms. Object detection on this dataset is much more difficult than on those standard public object detection datasets due to its inferior imaging quality. Facing the problem of imbalanced samples in object detection and hard training samples, we evaluate four popular detectors: YOLOv3, YOLOv4, FRCN-OHEM, and RetinaNet on this dataset. Experimental results indicate that the RetinaNet achieves the highest mAP. In addition, we demonstrate that hiding objects in different parts of the human body affect detection accuracy. The dataset is available at

Access Paper or Ask Questions

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

Jun 18, 2021
Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex Stark, Yin Cui, Boqing Gong, Andrew Howard

When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. However, most object detection models ignore the user intent, relying on image pixels as their only input. This often leads to incorrect results, such as lack of a high-confidence detection on the object of interest, or detection with a wrong class label. In this paper we investigate techniques to modulate standard object detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard object detectors, query-modulated detectors show superior performance at detecting objects for a given label of interest. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors can also outperform specialized referring expression recognition systems. Furthermore, they can be simultaneously trained to solve for both query-modulated detection and standard object detection.

Access Paper or Ask Questions

Object Detection from Video Tubelets with Convolutional Neural Networks

Apr 14, 2016
Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. For object detection, particularly in still images, the performance has been significantly increased last year thanks to powerful deep networks (e.g. GoogleNet) and detection frameworks (e.g. Regions with CNN features (R-CNN)). The lately introduced ImageNet task on object detection from video (VID) brings the object detection task into the video domain, in which objects' locations at each frame are required to be annotated with bounding boxes. In this work, we introduce a complete framework for the VID task based on still-image object detection and general object tracking. Their relations and contributions in the VID task are thoroughly studied and evaluated. In addition, a temporal convolution network is proposed to incorporate temporal information to regularize the detection results and shows its effectiveness for the task.

* Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on (pp. 817-825) 
* Accepted in CVPR 2016 as a Spotlight paper 
Access Paper or Ask Questions

A Survey of Deep Learning-based Object Detection

Jul 11, 2019
Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, Rong Qu

Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.

* 29 pages,12 figures 
Access Paper or Ask Questions

Clustered Object Detection in Aerial Images

Apr 16, 2019
Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling

Detecting objects in aerial images is challenging for at least two reasons: (1) target objects like pedestrians are very small in terms of pixels, making them hard to be distinguished from surrounding background; and (2) targets are in general very sparsely and nonuniformly distributed, making the detection very inefficient. In this paper we address both issues inspired by the observation that these targets are often clustered. In particular, we propose a Clustered Detection (ClusDet) network that unifies object cluster and detection in an end-to-end framework. The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet). Given an input image, CPNet produces (object) cluster regions and ScaleNet estimates object scales for these regions. Then, each scale-normalized cluster region and their features are fed into DetecNet for object detection. Compared with previous solutions, ClusDet has several advantages: (1) it greatly reduces the number of blocks for final object detection and hence achieves high running time efficiency, (2) the cluster-based scale estimation is more accurate than previously used single-object based ones, hence effectively improves the detection for small objects, and (3) the final DetecNet is dedicated for clustered regions and implicitly models the prior context information so as to boost detection accuracy. The proposed method is tested on three representative aerial image datasets including VisDrone, UAVDT and DOTA. In all the experiments, ClusDet achieves promising performance in both efficiency and accuracy, in comparison with state-of-the-art detectors.

Access Paper or Ask Questions

Speedy Object Detection based on Shape

Jul 12, 2013
Y. Jayanta Singh, Shalu Gupta

This study is a part of design of an audio system for in-house object detection system for visually impaired, low vision personnel by birth or by an accident or due to old age. The input of the system will be scene and output as audio. Alert facility is provided based on severity levels of the objects (snake, broke glass etc) and also during difficulties. The study proposed techniques to provide speedy detection of objects based on shapes and its scale. Features are extraction to have minimum spaces using dynamic scaling. From a scene, clusters of objects are formed based on the scale and shape. Searching is performed among the clusters initially based on the shape, scale, mean cluster value and index of object(s). The minimum operation to detect the possible shape of the object is performed. In case the object does not have a likely matching shape, scale etc, then the several operations required for an object detection will not perform; instead, it will declared as a new object. In such way, this study finds a speedy way of detecting objects.

* The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.3, June 2013 
* arXiv admin note: text overlap with arXiv:1210.7038 by other authors 
Access Paper or Ask Questions

Few-shot Object Detection via Feature Reweighting

Dec 05, 2018
Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

This work aims to solve the challenging few-shot object detection problem where only a few annotated examples are available for each object category to train a detection model. Such an ability of learning to detect an object from just a few examples is common for human vision systems, but remains absent for computer vision systems. Though few-shot meta learning offers a promising solution technique, previous works mostly target the task of image classification and are not directly applicable for the much more complicated object detection task. In this work, we propose a novel meta-learning based model with carefully designed architecture, which consists of a meta-model and a base detection model. The base detection model is trained on several base classes with sufficient samples to offer basis features. The meta-model is trained to reweight importance of features from the base detection model over the input image and adapt these features to assist novel object detection from a few examples. The meta-model is light-weight, end-to-end trainable and able to entail the base model with detection ability for novel objects fast. Through experiments we demonstrated our model can outperform baselines by a large margin for few-shot object detection, on multiple datasets and settings. Our model also exhibits fast adaptation speed to novel few-shot classes.

Access Paper or Ask Questions