Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Closing the Generalization Gap in One-Shot Object Detection

Nov 09, 2020
Claudio Michaelis, Matthias Bethge, Alexander S. Ecker

Despite substantial progress in object detection and few-shot learning, detecting objects based on a single example - one-shot object detection - remains a challenge: trained models exhibit a substantial generalization gap, where object categories used during training are detected much more reliably than novel ones. Here we show that this generalization gap can be nearly closed by increasing the number of object categories used during training. Our results show that the models switch from memorizing individual categories to learning object similarity over the category distribution, enabling strong generalization at test time. Importantly, in this regime standard methods to improve object detection models like stronger backbones or longer training schedules also benefit novel categories, which was not the case for smaller datasets like COCO. Our results suggest that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead in scaling the number of categories. Future data annotation efforts should therefore focus on wider datasets and annotate a larger number of categories rather than gathering more images or instances per category.


Few-shot Object Counting and Detection

Jul 28, 2022
Thanh Nguyen, Chau Pham, Khoi Nguyen, Minh Hoai

We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. To address this challenging problem, we introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR. The former is aimed at generating pseudo ground-truth bounding boxes to train the latter. The latter leverages the pseudo ground-truth provided by the former but takes the necessary steps to account for the imperfection of pseudo ground-truth. To validate the performance of our method on the new task, we introduce two new datasets named FSCD-147 and FSCD-LVIS. Both datasets contain images with complex scenes, multiple object classes per image, and a huge variation in object shapes, sizes, and appearance. Our proposed approach outperforms very strong baselines adapted from few-shot object counting and few-shot object detection with a large margin in both counting and detection metrics. The code and models are available at

* Accepted to ECCV 2022; The first two authors contribute equally 

Hi Detector, What's Wrong with that Object? Identifying Irregular Object From Images by Modelling the Detection Score Distribution

Feb 14, 2016
Peng Wang, Lingqiao Liu, Chunhua Shen, Anton van den Hengel, Heng Tao Shen

In this work, we study the challenging problem of identifying the irregular status of objects from images in an "open world" setting, that is, distinguishing the irregular status of an object category from its regular status as well as objects from other categories in the absence of "irregular object" training data. To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the "regular object" and "other objects". The key observation motivating our approach is that for "regular object" images as well as "other objects" images, the region-level scores follow their own essential patterns in terms of both the score values and the spatial distributions while the detection scores obtained from an "irregular object" image tend to break these patterns. To model this distribution, we propose to use Gaussian Processes (GP) to construct two separate generative models for the case of the "regular object" and the "other objects". More specifically, we design a new covariance function to simultaneously model the detection score at a single region and the score dependencies at multiple regions. We finally demonstrate the superior performance of our method on a large dataset newly proposed in this paper.

* 10 pages 

3D Backbone Network for 3D Object Detection

Jan 24, 2019
Xuesong Li, Jose E Guivant, Ngaiming Kwok, Yongzhi Xu

The task of detecting 3D objects in point cloud has a pivotal role in many real-world applications. However, 3D object detection performance is behind that of 2D object detection due to the lack of powerful 3D feature extraction methods. In order to address this issue, we propose to build a 3D backbone network to learn rich 3D feature maps by using sparse 3D CNN operations for 3D object detection in point cloud. The 3D backbone network can inherently learn 3D features from almost raw data without compressing point cloud into multiple 2D images and generate rich feature maps for object detection. The sparse 3D CNN takes full advantages of the sparsity in the 3D point cloud to accelerate computation and save memory, which makes the 3D backbone network achievable. Empirical experiments are conducted on the KITTI benchmark and results show that the proposed method can achieve state-of-the-art performance for 3D object detection.


MOD: Benchmark for Military Object Detection

May 11, 2021
Xin Yi, Jiahao Wu, Bo Ma, Yangtong Ou, Longyao Liu

Object detection is widely studied in computer vision filed. In recent years, certain representative deep learning based detection methods along with solid benchmarks are proposed, which boosts the development of related researchs. However, there is no object detection benchmark targeted at military field so far. To facilitate future military object detection research, we propose a novel, publicly available object detection benchmark in military filed called MOD, which contains 6,000 images and 17,465 labeled instances. Unlike previous benchmarks, objects in MOD contain unique challenges such as camouflage, blur, inter-class similarity, intra-class variance and complex military environment. Experiments show that under above chanllenges, existing detection methods suffer from undesirable performance. To address this issue, we propose LGA-RCNN which utilizes a loss-guided attention (LGA) module to highlight representative region of objects. Then, those highlighted local information are fused with global information for precise classification and localization. Extensive experiments on MOD validate the effectiveness of our method and the whole dataset can be found at

* We strongly request the withdrawal due to the following reasons: 1. The model is prone to overfitting on such a small dataset. 2. In section 4, the Gaussian mask is the key to the method, but our ablation experiment on the number of masks is not detailed enough. 3. There are certain problems with the hyperparameter settings of the model. We sincerely apologize for the inconvenience caused 

Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning

May 19, 2022
Na Dong, Yongqiang Zhang, Mingli Ding, Gim Hee Lee

Incremental few-shot object detection aims at detecting novel classes without forgetting knowledge of the base classes with only a few labeled training data from the novel classes. Most related prior works are on incremental object detection that rely on the availability of abundant training samples per novel class that substantially limits the scalability to real-world setting where novel data can be scarce. In this paper, we propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector. To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision from additional object proposals generated using Selective Search as pseudo labels. We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting. Extensive experiments conducted on standard incremental object detection and incremental few-shot object detection settings show that our approach significantly outperforms state-of-the-art methods by a large margin.

* 11 pages, 2 figures 

Decoupled Self Attention for Accurate One Stage Object Detection

Dec 15, 2020
Kehe WU, Zuge Chen, Qi MA, Xiaoliang Zhang, Wei Li

As the scale of object detection dataset is smaller than that of image recognition dataset ImageNet, transfer learning has become a basic training method for deep learning object detection models, which will pretrain the backbone network of object detection model on ImageNet dataset to extract features for classification and localization subtasks. However, the classification task focuses on the salient region features of object, while the location task focuses on the edge features of object, so there is certain deviation between the features extracted by pretrained backbone network and the features used for localization task. In order to solve this problem, a decoupled self attention(DSA) module is proposed for one stage object detection models in this paper. DSA includes two decoupled self-attention branches, so it can extract appropriate features for different tasks. It is located between FPN and head networks of subtasks, so it is used to extract global features based on FPN fused features for different tasks independently. Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models. Our experiments are based on the representative one-stage detection model RetinaNet. In COCO dataset, when ResNet50 and ResNet101 are used as backbone networks, the detection performances can be increased by 0.4% AP and 0.5% AP respectively. When DSA module and object confidence task are applied in RetinaNet together, the detection performances based on ResNet50 and ResNet101 can be increased by 1.0% AP and 1.4% AP respectively. The experiment results show the effectiveness of DSA module. Code is at:

* 15 pages, 5 figures