Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Decoupled Adaptation for Cross-Domain Object Detection

Oct 06, 2021
Junguang Jiang, Baixu Chen, Jianmin Wang, Mingsheng Long

Cross-domain object detection is more challenging than object classification since multiple objects exist in an image and the location of each object is unknown in the unlabeled target domain. As a result, when we adapt features of different objects to enhance the transferability of the detector, the features of the foreground and the background are easy to be confused, which may hurt the discriminability of the detector. Besides, previous methods focused on category adaptation but ignored another important part for object detection, i.e., the adaptation on bounding box regression. To this end, we propose D-adapt, namely Decoupled Adaptation, to decouple the adversarial adaptation and the training of the detector. Besides, we fill the blank of regression domain adaptation in object detection by introducing a bounding box adaptor. Experiments show that D-adapt achieves state-of-the-art results on four cross-domain object detection tasks and yields 17% and 21% relative improvement on benchmark datasets Clipart1k and Comic2k in particular.


IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents

Aug 06, 2020
Ajoy Mondal, Peter Lipps, C. V. Jawahar

We introduce a new dataset for graphical object detection in business documents, more specifically annual reports. This dataset, IIIT-AR-13k, is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection. Annual reports created in multiple languages for several years from various companies bring high diversity into this dataset. We benchmark IIIT-AR-13K dataset with two state of the art graphical object detection techniques using Faster R-CNN [20] and Mask R-CNN [11] and establish high baselines for further research. Our dataset is highly effective as training data for developing practical solutions for graphical object detection in both business documents and technical articles. By training with IIIT-AR-13K, we demonstrate the feasibility of a single solution that can report superior performance compared to the equivalent ones trained with a much larger amount of data, for table detection. We hope that our dataset helps in advancing the research for detecting various types of graphical objects in business documents.

* Published in DAS 2020 
* 15 pages, DAS 2020 

Relation Networks for Object Detection

Jun 14, 2018
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei

Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector.


AmphibianDetector: adaptive computation for moving objects detection

Nov 15, 2020
David Svitov, Sergey Alyamkin

Convolutional neural networks (CNN) allow achieving the highest accuracy for the task of object detection in images. Major challenges in further development of object detectors are false-positive detections and high demand of processing power. In this paper, we propose an approach to object detection, which makes it possible to reduce the number of false-positive detections by processing only moving objects and reduce required processing power for algorithm inference. The proposed approach is modification of the CNN already trained for object detection task. This method can be used to improve the accuracy of an existing system by applying minor changes to the existing algorithm. The efficiency of the proposed approach was demonstrated on the open dataset "CDNet2014 pedestrian". The implementation of the method proposed in the article is available on the GitHub:

* 12 pages, 6 figures, 1 table 

A tomographic workflow to enable deep learning for X-ray based foreign object detection

Jan 28, 2022
Mathé T. Zeegers, Tristan van Leeuwen, Daniël M. Pelt, Sophia Bethany Coban, Robert van Liere, Kees Joost Batenburg

Detection of unwanted (`foreign') objects within products is a common procedure in many branches of industry for maintaining production quality. X-ray imaging is a fast, non-invasive and widely applicable method for foreign object detection. Deep learning has recently emerged as a powerful approach for recognizing patterns in radiographs (i.e., X-ray images), enabling automated X-ray based foreign object detection. However, these methods require a large number of training examples and manual annotation of these examples is a subjective and laborious task. In this work, we propose a Computed Tomography (CT) based method for producing training data for supervised learning of foreign object detection, with minimal labour requirements. In our approach, a few representative objects are CT scanned and reconstructed in 3D. The radiographs that have been acquired as part of the CT-scan data serve as input for the machine learning method. High-quality ground truth locations of the foreign objects are obtained through accurate 3D reconstructions and segmentations. Using these segmented volumes, corresponding 2D segmentations are obtained by creating virtual projections. We outline the benefits of objectively and reproducibly generating training data in this way compared to conventional radiograph annotation. In addition, we show how the accuracy depends on the number of objects used for the CT reconstructions. The results show that in this workflow generally only a relatively small number of representative objects (i.e., fewer than 10) are needed to achieve adequate detection performance in an industrial setting. Moreover, for real experimental data we show that the workflow leads to higher foreign object detection accuracies than with standard radiograph annotation.

* This paper is under consideration at Expert Systems with Applications. 22 pages, 15 figures 

On Applying Machine Learning/Object Detection Models for Analysing Digitally Captured Physical Prototypes from Engineering Design Projects

May 07, 2019
Jorgen F. Erichsen, Sampsa Kohtala, Martin Steinert, Torgeir Welo

While computer vision has received increasing attention in computer science over the last decade, there are few efforts in applying this to leverage engineering design research. Existing datasets and technologies allow researchers to capture and access more observations and video files, hence analysis is becoming a limiting factor. Therefore, this paper is investigating the application of machine learning, namely object detection methods to aid in the analysis of physical porotypes. With access to a large dataset of digitally captured physical prototypes from early-stage development projects (5950 images from 850 prototypes), the authors investigate applications that can be used for analysing this dataset. The authors retrained two pre-trained object detection models from two known frameworks, the TensorFlow Object Detection API and Darknet, using custom image sets of images of physical prototypes. As a result, a proof-of-concept of four trained models are presented; two models for detecting samples of wood-based sheet materials and two models for detecting samples containing microcontrollers. All models are evaluated using standard metrics for object detection model performance and the applicability of using object detection models in engineering design research is discussed. Results indicate that the models can successfully classify the type of material and type of pre-made component, respectively. However, more work is needed to fully integrate object detection models in the engineering design analysis workflow. The authors also extrapolate that the use of object detection for analysing images of physical prototypes will substantially reduce the effort required for analysing large datasets in engineering design research.

* 13 pages, 4 tables, 3 figures 

FFAVOD: Feature Fusion Architecture for Video Object Detection

Sep 15, 2021
Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, Maguelonne Héritier

A significant amount of redundancy exists between consecutive frames of a video. Object detectors typically produce detections for one image at a time, without any capabilities for taking advantage of this redundancy. Meanwhile, many applications for object detection work with videos, including intelligent transportation systems, advanced driver assistance systems and video surveillance. Our work aims at taking advantage of the similarity between video frames to produce better detections. We propose FFAVOD, standing for feature fusion architecture for video object detection. We first introduce a novel video object detection architecture that allows a network to share feature maps between nearby frames. Second, we propose a feature fusion module that learns to merge feature maps to enhance them. We show that using the proposed architecture and the fusion module can improve the performance of three base object detectors on two object detection benchmarks containing sequences of moving road users. Additionally, to further increase performance, we propose an improvement to the SpotNet attention module. Using our architecture on the improved SpotNet detector, we obtain the state-of-the-art performance on the UA-DETRAC public benchmark as well as on the UAVDT dataset. Code is available at

* Accepted for publication in Pattern Recognition Letters 

Point2Seq: Detecting 3D Objects as Sequences

Mar 25, 2022
Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei Zhang, Xiaogang Wang, Xinchao Wang

We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. In contrast to previous methods that normally {predict attributes of 3D objects all at once}, we expressively model the interdependencies between attributes of 3D objects, which in turn enables a better detection accuracy. Specifically, we view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner. We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words. The predicted words eventually constitute a set of sequences that completely describe the 3D objects in the scene, and all the predicted sequences are then automatically assigned to the respective ground truths through similarity-based sequence matching. Our approach is conceptually intuitive and can be readily plugged upon most existing 3D-detection backbones without adding too much computational overhead; the sequential decoding paradigm we proposed, on the other hand, can better exploit information from complex 3D scenes with the aid of preceding predicted words. Without bells and whistles, our method significantly outperforms previous anchor- and center-based 3D object detection frameworks, yielding the new state of the art on the challenging ONCE dataset as well as the Waymo Open Dataset. Code is available at \url{}.

* To appear in CVPR2022 

Towards Deeper Understanding of Camouflaged Object Detection

May 23, 2022
Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Nick Barnes, Deng-Ping Fan

Preys in the wild evolve to be camouflaged to avoid being recognized by predators. In this way, camouflage acts as a key defence mechanism across species that is critical to survival. To detect and segment the whole scope of a camouflaged object, camouflaged object detection (COD) is introduced as a binary segmentation task, with the binary ground truth camouflage map indicating the exact regions of the camouflaged objects. In this paper, we revisit this task and argue that the binary segmentation setting fails to fully understand the concept of camouflage. We find that explicitly modeling the conspicuousness of camouflaged objects against their particular backgrounds can not only lead to a better understanding about camouflage, but also provide guidance to designing more sophisticated camouflage techniques. Furthermore, we observe that it is some specific parts of camouflaged objects that make them detectable by predators. With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment and rank camouflaged objects, indicating the conspicuousness level of camouflage. As no corresponding datasets exist for either the localization model or the ranking model, we generate localization maps with an eye tracker, which are then processed according to the instance level labels to generate our ranking-based training and testing dataset. We also contribute the largest COD testing set to comprehensively analyse performance of the camouflaged object detection models. Experimental results show that our triple-task learning framework achieves new state-of-the-art, leading to a more explainable camouflaged object detection network. Our code, data and results are available at:


Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

May 29, 2021
Zhaoxin Fan, Yazhi Zhu, Yulin He, Qi Sun, Hongyan Liu, Jun He

Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, there is lack of survey study about latest development of deep learning based methods. Therefore, this paper presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this paper is limited to methods taking monocular RGB/RGBD data as input, covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods about both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

* 24 pages,8 figures