Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images

Jan 07, 2022
Ziteng Cui, Yingying Zhu, Lin Gu, Guo-Jun Qi, Xiaoxiao Li, Peng Gao, Zenghui Zhang, Tatsuya Harada

Image restoration algorithms such as super resolution (SR) are indispensable pre-processing modules for object detection in degraded images. However, most of these algorithms assume the degradation is fixed and known a priori. When the real degradation is unknown or differs from assumption, both the pre-processing module and the consequent high-level task such as object detection would fail. Here, we propose a novel framework, RestoreDet, to detect objects in degraded low resolution images. RestoreDet utilizes the downsampling degradation as a kind of transformation for self-supervised signals to explore the equivariant representation against various resolutions and other degradation conditions. Specifically, we learn this intrinsic visual structure by encoding and decoding the degradation transformation from a pair of original and randomly degraded images. The framework could further take the advantage of advanced SR architectures with an arbitrary resolution restoring decoder to reconstruct the original correspondence from the degraded input image. Both the representation learning and object detection are optimized jointly in an end-to-end training fashion. RestoreDet is a generic framework that could be implemented on any mainstream object detection architectures. The extensive experiment shows that our framework based on CenterNet has achieved superior performance compared with existing methods when facing variant degradation situations. Our code would be released soon.

* 11 pages, 3figures 

Scaling Object Detection by Transferring Classification Weights

Sep 15, 2019
Jason Kuen, Federico Perazzi, Zhe Lin, Jianming Zhang, Yap-Peng Tan

Large scale object detection datasets are constantly increasing their size in terms of the number of classes and annotations count. Yet, the number of object-level categories annotated in detection datasets is an order of magnitude smaller than image-level classification labels. State-of-the art object detection models are trained in a supervised fashion and this limits the number of object classes they can detect. In this paper, we propose a novel weight transfer network (WTN) to effectively and efficiently transfer knowledge from classification network's weights to detection network's weights to allow detection of novel classes without box supervision. We first introduce input and feature normalization schemes to curb the under-fitting during training of a vanilla WTN. We then propose autoencoder-WTN (AE-WTN) which uses reconstruction loss to preserve classification network's information over all classes in the target latent space to ensure generalization to novel classes. Compared to vanilla WTN, AE-WTN obtains absolute performance gains of 6% on two Open Images evaluation sets with 500 seen and 57 novel classes respectively, and 25% on a Visual Genome evaluation set with 200 novel classes. The code is available at

* ICCV 2019 

A Semantic Consistency Feature Alignment Object Detection Model Based on Mixed-Class Distribution Metrics

Jun 12, 2022
Lijun Gou, Jinrong Yang, Hangcheng Yu, Pan Wang, Xiaoping Li, Chao Deng

Unsupervised domain adaptation is critical in various computer vision tasks, such as object detection, instance segmentation, etc. They attempt to reduce domain bias-induced performance degradation while also promoting model application speed. Previous works in domain adaptation object detection attempt to align image-level and instance-level shifts to eventually minimize the domain discrepancy, but they may align single-class features to mixed-class features in image-level domain adaptation because each image in the object detection task may be more than one class and object. In order to achieve single-class with single-class alignment and mixed-class with mixed-class alignment, we treat the mixed-class of the feature as a new class and propose a mixed-classes $H-divergence$ for object detection to achieve homogenous feature alignment and reduce negative transfer. Then, a Semantic Consistency Feature Alignment Model (SCFAM) based on mixed-classes $H-divergence$ was also presented. To improve single-class and mixed-class semantic information and accomplish semantic separation, the SCFAM model proposes Semantic Prediction Models (SPM) and Semantic Bridging Components (SBC). And the weight of the pix domain discriminator loss is then changed based on the SPM result to reduce sample imbalance. Extensive unsupervised domain adaption experiments on widely used datasets illustrate our proposed approach's robust object detection in domain bias settings.


Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

Feb 26, 2018
Weifeng Ge, Sibei Yang, Yizhou Yu

Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.

* accepted by IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018 

Aligning Pretraining for Detection via Object-Level Contrastive Learning

Jun 04, 2021
Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. Such generality for transfer learning, however, sacrifices specificity if we are interested in a certain downstream task. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. In this paper, we follow this principle with a pretraining method specifically designed for the task of object detection. We attain alignment in the following three aspects: 1) object-level representations are introduced via selective search bounding boxes as object proposals; 2) the pretraining network architecture incorporates the same dedicated modules used in the detection pipeline (e.g. FPN); 3) the pretraining is equipped with object detection properties such as object-level translation invariance and scale invariance. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection using a Mask R-CNN framework. Code and models will be made available.


A Benchmark dataset for both underwater image enhancement and underwater object detection

Jun 29, 2020
Long Chen, Lei Tong, Feixiang Zhou, Zheheng Jiang, Zhenyang Li, Jialin Lv, Junyu Dong, Huiyu Zhou

Underwater image enhancement is such an important vision task due to its significance in marine engineering and aquatic robot. It is usually work as a pre-processing step to improve the performance of high level vision tasks such as underwater object detection. Even though many previous works show the underwater image enhancement algorithms can boost the detection accuracy of the detectors, no work specially focus on investigating the relationship between these two tasks. This is mainly because existing underwater datasets lack either bounding box annotations or high quality reference images, based on which detection accuracy or image quality assessment metrics are calculated. To investigate how the underwater image enhancement methods influence the following underwater object detection tasks, in this paper, we provide a large-scale underwater object detection dataset with both bounding box annotations and high quality reference images, namely OUC dataset. The OUC dataset provides a platform for researchers to comprehensive study the influence of underwater image enhancement algorithms on the underwater object detection task.


Localized Vision-Language Matching for Open-vocabulary Object Detection

May 12, 2022
Maria A. Bravo, Sudhanshu Mittal, Thomas Brox

In this work, we propose an open-world object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-world detection approaches while being data-efficient.


\emph{cm}SalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks

Dec 21, 2019
Bo Jiang, Zitai Zhou, Xiao Wang, Jin Tang

Image salient object detection (SOD) is an active research topic in computer vision and multimedia area. Fusing complementary information of RGB and depth has been demonstrated to be effective for image salient object detection which is known as RGB-D salient object detection problem. The main challenge for RGB-D salient object detection is how to exploit the salient cues of both intra-modality (RGB, depth) and cross-modality simultaneously which is known as cross-modality detection problem. In this paper, we tackle this challenge by designing a novel cross-modality Saliency Generative Adversarial Network (\emph{cm}SalGAN). \emph{cm}SalGAN aims to learn an optimal view-invariant and consistent pixel-level representation for RGB and depth images via a novel adversarial learning framework, which thus incorporates both information of intra-view and correlation information of cross-view images simultaneously for RGB-D saliency detection problem. To further improve the detection results, the attention mechanism and edge detection module are also incorporated into \emph{cm}SalGAN. The entire \emph{cm}SalGAN can be trained in an end-to-end manner by using the standard deep neural network framework. Experimental results show that \emph{cm}SalGAN achieves the new state-of-the-art RGB-D saliency detection performance on several benchmark datasets.

* Submitted to IEEE Transactions on Multimedia 

A novel method for object detection using deep learning and CAD models

Feb 12, 2021
Igor Garcia Ballhausen Sampaio, Luigy Machaca, José Viterbo, Joris Guérin

Object Detection (OD) is an important computer vision problem for industry, which can be used for quality control in the production lines, among other applications. Recently, Deep Learning (DL) methods have enabled practitioners to train OD models performing well on complex real world images. However, the adoption of these models in industry is still limited by the difficulty and the significant cost of collecting high quality training datasets. On the other hand, when applying OD to the context of production lines, CAD models of the objects to be detected are often available. In this paper, we introduce a fully automated method that uses a CAD model of an object and returns a fully trained OD model for detecting this object. To do this, we created a Blender script that generates realistic labeled datasets of images containing the object, which are then used for training the OD model. The method is validated experimentally on two practical examples, showing that this approach can generate OD models performing well on real images, while being trained only on synthetic images. The proposed method has potential to facilitate the adoption of object detection models in industry as it is easy to adapt for new objects and highly flexible. Hence, it can result in significant costs reduction, gains in productivity and improved products quality.

* 8 pages, 4 figures, 2 tables, To appear in the proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) 

Multi-Echo LiDAR for 3D Object Detection

Jul 23, 2021
Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani

LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection. A single laser pulse can be partially reflected by multiple objects along its path, resulting in multiple measurements called echoes. Multi-echo measurement can provide information about object contours and semi-transparent surfaces which can be used to better identify and locate objects. LiDAR can also measure surface reflectance (intensity of laser pulse return), as well as ambient light of the scene (sunlight reflected by objects). These signals are already available in commercial LiDAR devices but have not been used in most LiDAR-based detection models. We present a 3D object detection model which leverages the full spectrum of measurement signals provided by LiDAR. First, we propose a multi-signal fusion (MSF) module to combine (1) the reflectance and ambient features extracted with a 2D CNN, and (2) point cloud features extracted using a 3D graph neural network (GNN). Second, we propose a multi-echo aggregation (MEA) module to combine the information encoded in different set of echo points. Compared with traditional single echo point cloud methods, our proposed Multi-Signal LiDAR Detector (MSLiD) extracts richer context information from a wider range of sensing measurements and achieves more accurate 3D object detection. Experiments show that by incorporating the multi-modality of LiDAR, our method outperforms the state-of-the-art by up to 9.1%.