Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

3D Backbone Network for 3D Object Detection

Jan 24, 2019
Xuesong Li, Jose E Guivant, Ngaiming Kwok, Yongzhi Xu

The task of detecting 3D objects in point cloud has a pivotal role in many real-world applications. However, 3D object detection performance is behind that of 2D object detection due to the lack of powerful 3D feature extraction methods. In order to address this issue, we propose to build a 3D backbone network to learn rich 3D feature maps by using sparse 3D CNN operations for 3D object detection in point cloud. The 3D backbone network can inherently learn 3D features from almost raw data without compressing point cloud into multiple 2D images and generate rich feature maps for object detection. The sparse 3D CNN takes full advantages of the sparsity in the 3D point cloud to accelerate computation and save memory, which makes the 3D backbone network achievable. Empirical experiments are conducted on the KITTI benchmark and results show that the proposed method can achieve state-of-the-art performance for 3D object detection.

Access Paper or Ask Questions

MOD: Benchmark for Military Object Detection

May 11, 2021
Xin Yi, Jiahao Wu, Bo Ma, Yangtong Ou, Longyao Liu

Object detection is widely studied in computer vision filed. In recent years, certain representative deep learning based detection methods along with solid benchmarks are proposed, which boosts the development of related researchs. However, there is no object detection benchmark targeted at military field so far. To facilitate future military object detection research, we propose a novel, publicly available object detection benchmark in military filed called MOD, which contains 6,000 images and 17,465 labeled instances. Unlike previous benchmarks, objects in MOD contain unique challenges such as camouflage, blur, inter-class similarity, intra-class variance and complex military environment. Experiments show that under above chanllenges, existing detection methods suffer from undesirable performance. To address this issue, we propose LGA-RCNN which utilizes a loss-guided attention (LGA) module to highlight representative region of objects. Then, those highlighted local information are fused with global information for precise classification and localization. Extensive experiments on MOD validate the effectiveness of our method and the whole dataset can be found at

* We strongly request the withdrawal due to the following reasons: 1. The model is prone to overfitting on such a small dataset. 2. In section 4, the Gaussian mask is the key to the method, but our ablation experiment on the number of masks is not detailed enough. 3. There are certain problems with the hyperparameter settings of the model. We sincerely apologize for the inconvenience caused 
Access Paper or Ask Questions

Incremental-DETR: Incremental Few-Shot Object Detection via Self-Supervised Learning

May 19, 2022
Na Dong, Yongqiang Zhang, Mingli Ding, Gim Hee Lee

Incremental few-shot object detection aims at detecting novel classes without forgetting knowledge of the base classes with only a few labeled training data from the novel classes. Most related prior works are on incremental object detection that rely on the availability of abundant training samples per novel class that substantially limits the scalability to real-world setting where novel data can be scarce. In this paper, we propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector. To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision from additional object proposals generated using Selective Search as pseudo labels. We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting. Extensive experiments conducted on standard incremental object detection and incremental few-shot object detection settings show that our approach significantly outperforms state-of-the-art methods by a large margin.

* 11 pages, 2 figures 
Access Paper or Ask Questions

Decoupled Self Attention for Accurate One Stage Object Detection

Dec 15, 2020
Kehe WU, Zuge Chen, Qi MA, Xiaoliang Zhang, Wei Li

As the scale of object detection dataset is smaller than that of image recognition dataset ImageNet, transfer learning has become a basic training method for deep learning object detection models, which will pretrain the backbone network of object detection model on ImageNet dataset to extract features for classification and localization subtasks. However, the classification task focuses on the salient region features of object, while the location task focuses on the edge features of object, so there is certain deviation between the features extracted by pretrained backbone network and the features used for localization task. In order to solve this problem, a decoupled self attention(DSA) module is proposed for one stage object detection models in this paper. DSA includes two decoupled self-attention branches, so it can extract appropriate features for different tasks. It is located between FPN and head networks of subtasks, so it is used to extract global features based on FPN fused features for different tasks independently. Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models. Our experiments are based on the representative one-stage detection model RetinaNet. In COCO dataset, when ResNet50 and ResNet101 are used as backbone networks, the detection performances can be increased by 0.4% AP and 0.5% AP respectively. When DSA module and object confidence task are applied in RetinaNet together, the detection performances based on ResNet50 and ResNet101 can be increased by 1.0% AP and 1.4% AP respectively. The experiment results show the effectiveness of DSA module. Code is at:

* 15 pages, 5 figures 
Access Paper or Ask Questions

A Survey on Deep Domain Adaptation and Tiny Object Detection Challenges, Techniques and Datasets

Jul 16, 2021
Muhammed Muzammul, Xi Li

This survey paper specially analyzed computer vision-based object detection challenges and solutions by different techniques. We mainly highlighted object detection by three different trending strategies, i.e., 1) domain adaptive deep learning-based approaches (discrepancy-based, Adversarial-based, Reconstruction-based, Hybrid). We examined general as well as tiny object detection-related challenges and offered solutions by historical and comparative analysis. In part 2) we mainly focused on tiny object detection techniques (multi-scale feature learning, Data augmentation, Training strategy (TS), Context-based detection, GAN-based detection). In part 3), To obtain knowledge-able findings, we discussed different object detection methods, i.e., convolutions and convolutional neural networks (CNN), pooling operations with trending types. Furthermore, we explained results with the help of some object detection algorithms, i.e., R-CNN, Fast R-CNN, Faster R-CNN, YOLO, and SSD, which are generally considered the base bone of CV, CNN, and OD. We performed comparative analysis on different datasets such as MS-COCO, PASCAL VOC07,12, and ImageNet to analyze results and present findings. At the end, we showed future directions with existing challenges of the field. In the future, OD methods and models can be analyzed for real-time object detection, tracking strategies.

Access Paper or Ask Questions

Learning to Detect Open Carry and Concealed Object with 77GHz Radar

Oct 31, 2021
Xiangyu Gao, Hui Liu, Sumit Roy, Guanbin Xing, Ali Alansari, Youchen Luo

Detecting harmful carried objects plays a key role in intelligent surveillance systems and has widespread applications, for example, in airport security. In this paper, we focus on the relatively unexplored area of using low-cost 77GHz mmWave radar for the carried objects detection problem. The proposed system is capable of real-time detecting three classes of objects - laptop, phone, and knife - under open carry and concealed cases where objects are hidden with clothes or bags. This capability is achieved by initial signal processing for localization and generating range-azimuth-elevation image cubes, followed by a deep learning-based prediction network and a multi-shot post-processing module for detecting objects. Extensive experiments for validating the system performance on detecting open carry and concealed objects have been presented with a self-built radar-camera testbed and dataset. Additionally, the influence of different input, factors, and parameters on system performance is analyzed, providing an intuitive understanding of the system. This system would be the very first baseline for other future works aiming to detect carried objects using 77GHz radar.

* 12 pages 
Access Paper or Ask Questions

DenseBox: Unifying Landmark Localization with End to End Object Detection

Sep 19, 2015
Lichao Huang, Yi Yang, Yafeng Deng, Yinan Yu

How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.

Access Paper or Ask Questions

Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior

May 07, 2021
Lu Liu, Robby T. Tan

Human-Object Interaction (HOI) detection aims to detect visual relations between human and objects in images. One significant problem of HOI detection is that non-interactive human-object pair can be easily mis-grouped and misclassified as an action, especially when humans are close and performing similar actions in the scene. To address the mis-grouping problem, we propose a spatial enhancement approach to enforce fine-level spatial constraints in two directions from human body parts to the object center, and from object parts to the human center. At inference, we propose a human-object regrouping approach by considering the object-exclusive property of an action, where the target object should not be shared by more than one human. By suppressing non-interactive pairs, our approach can decrease the false positives. Experiments on V-COCO and HICO-DET datasets demonstrate our approach is more robust compared to the existing methods under the presence of multiple humans and objects in the scene.

Access Paper or Ask Questions