Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Aligning Pretraining for Detection via Object-Level Contrastive Learning

Jun 04, 2021
Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. Such generality for transfer learning, however, sacrifices specificity if we are interested in a certain downstream task. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. In this paper, we follow this principle with a pretraining method specifically designed for the task of object detection. We attain alignment in the following three aspects: 1) object-level representations are introduced via selective search bounding boxes as object proposals; 2) the pretraining network architecture incorporates the same dedicated modules used in the detection pipeline (e.g. FPN); 3) the pretraining is equipped with object detection properties such as object-level translation invariance and scale invariance. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection using a Mask R-CNN framework. Code and models will be made available.

Access Paper or Ask Questions

A Benchmark dataset for both underwater image enhancement and underwater object detection

Jun 29, 2020
Long Chen, Lei Tong, Feixiang Zhou, Zheheng Jiang, Zhenyang Li, Jialin Lv, Junyu Dong, Huiyu Zhou

Underwater image enhancement is such an important vision task due to its significance in marine engineering and aquatic robot. It is usually work as a pre-processing step to improve the performance of high level vision tasks such as underwater object detection. Even though many previous works show the underwater image enhancement algorithms can boost the detection accuracy of the detectors, no work specially focus on investigating the relationship between these two tasks. This is mainly because existing underwater datasets lack either bounding box annotations or high quality reference images, based on which detection accuracy or image quality assessment metrics are calculated. To investigate how the underwater image enhancement methods influence the following underwater object detection tasks, in this paper, we provide a large-scale underwater object detection dataset with both bounding box annotations and high quality reference images, namely OUC dataset. The OUC dataset provides a platform for researchers to comprehensive study the influence of underwater image enhancement algorithms on the underwater object detection task.

Access Paper or Ask Questions

Localized Vision-Language Matching for Open-vocabulary Object Detection

May 12, 2022
Maria A. Bravo, Sudhanshu Mittal, Thomas Brox

In this work, we propose an open-world object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-world detection approaches while being data-efficient.

Access Paper or Ask Questions

\emph{cm}SalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks

Dec 21, 2019
Bo Jiang, Zitai Zhou, Xiao Wang, Jin Tang

Image salient object detection (SOD) is an active research topic in computer vision and multimedia area. Fusing complementary information of RGB and depth has been demonstrated to be effective for image salient object detection which is known as RGB-D salient object detection problem. The main challenge for RGB-D salient object detection is how to exploit the salient cues of both intra-modality (RGB, depth) and cross-modality simultaneously which is known as cross-modality detection problem. In this paper, we tackle this challenge by designing a novel cross-modality Saliency Generative Adversarial Network (\emph{cm}SalGAN). \emph{cm}SalGAN aims to learn an optimal view-invariant and consistent pixel-level representation for RGB and depth images via a novel adversarial learning framework, which thus incorporates both information of intra-view and correlation information of cross-view images simultaneously for RGB-D saliency detection problem. To further improve the detection results, the attention mechanism and edge detection module are also incorporated into \emph{cm}SalGAN. The entire \emph{cm}SalGAN can be trained in an end-to-end manner by using the standard deep neural network framework. Experimental results show that \emph{cm}SalGAN achieves the new state-of-the-art RGB-D saliency detection performance on several benchmark datasets.

* Submitted to IEEE Transactions on Multimedia 
Access Paper or Ask Questions

A novel method for object detection using deep learning and CAD models

Feb 12, 2021
Igor Garcia Ballhausen Sampaio, Luigy Machaca, José Viterbo, Joris Guérin

Object Detection (OD) is an important computer vision problem for industry, which can be used for quality control in the production lines, among other applications. Recently, Deep Learning (DL) methods have enabled practitioners to train OD models performing well on complex real world images. However, the adoption of these models in industry is still limited by the difficulty and the significant cost of collecting high quality training datasets. On the other hand, when applying OD to the context of production lines, CAD models of the objects to be detected are often available. In this paper, we introduce a fully automated method that uses a CAD model of an object and returns a fully trained OD model for detecting this object. To do this, we created a Blender script that generates realistic labeled datasets of images containing the object, which are then used for training the OD model. The method is validated experimentally on two practical examples, showing that this approach can generate OD models performing well on real images, while being trained only on synthetic images. The proposed method has potential to facilitate the adoption of object detection models in industry as it is easy to adapt for new objects and highly flexible. Hence, it can result in significant costs reduction, gains in productivity and improved products quality.

* 8 pages, 4 figures, 2 tables, To appear in the proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) 
Access Paper or Ask Questions

Multi-Echo LiDAR for 3D Object Detection

Jul 23, 2021
Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakuma, Matthew O'Toole, Kris Kitani

LiDAR sensors can be used to obtain a wide range of measurement signals other than a simple 3D point cloud, and those signals can be leveraged to improve perception tasks like 3D object detection. A single laser pulse can be partially reflected by multiple objects along its path, resulting in multiple measurements called echoes. Multi-echo measurement can provide information about object contours and semi-transparent surfaces which can be used to better identify and locate objects. LiDAR can also measure surface reflectance (intensity of laser pulse return), as well as ambient light of the scene (sunlight reflected by objects). These signals are already available in commercial LiDAR devices but have not been used in most LiDAR-based detection models. We present a 3D object detection model which leverages the full spectrum of measurement signals provided by LiDAR. First, we propose a multi-signal fusion (MSF) module to combine (1) the reflectance and ambient features extracted with a 2D CNN, and (2) point cloud features extracted using a 3D graph neural network (GNN). Second, we propose a multi-echo aggregation (MEA) module to combine the information encoded in different set of echo points. Compared with traditional single echo point cloud methods, our proposed Multi-Signal LiDAR Detector (MSLiD) extracts richer context information from a wider range of sensing measurements and achieves more accurate 3D object detection. Experiments show that by incorporating the multi-modality of LiDAR, our method outperforms the state-of-the-art by up to 9.1%.

Access Paper or Ask Questions

Minimum Delay Object Detection From Video

Aug 29, 2019
Dong Lao, Ganesh Sundaramoorthi

We consider the problem of detecting objects, as they come into view, from videos in an online fashion. We provide the first real-time solution that is guaranteed to minimize the delay, i.e., the time between when the object comes in view and the declared detection time, subject to acceptable levels of detection accuracy. The method leverages modern CNN-based object detectors that operate on a single frame, to aggregate detection results over frames to provide reliable detection at a rate, specified by the user, in guaranteed minimal delay. To do this, we formulate the problem as a Quickest Detection problem, which provides the aforementioned guarantees. We derive our algorithms from this theory. We show in experiments, that with an overhead of just 50 fps, we can increase the number of correct detections and decrease the overall computational cost compared to running a modern single-frame detector.

* ICCV 2019 
Access Paper or Ask Questions

RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement

Nov 09, 2018
Kiwoo Shin, Youngwook Paul Kwon, Masayoshi Tomizuka

We present RoarNet, a new approach for 3D object detection from a 2D image and 3D Lidar point clouds. Based on two-stage object detection framework with PointNet as our backbone network, we suggest several novel ideas to improve 3D object detection performance. The first part of our method, RoarNet_2D, estimates the 3D poses of objects from a monocular image, which approximates where to examine further, and derives multiple candidates that are geometrically feasible. This step significantly narrows down feasible 3D regions, which otherwise requires demanding processing of 3D point clouds in a huge search space. Then the second part, RoarNet_3D, takes the candidate regions and conducts in-depth inferences to conclude final poses in a recursive manner. Inspired by PointNet, RoarNet_3D processes 3D point clouds directly without any loss of data, leading to precise detection. We evaluate our method in KITTI, a 3D object detection benchmark. Our result shows that RoarNet has superior performance to state-of-the-art methods that are publicly available. Remarkably, RoarNet also outperforms state-of-the-art methods even in settings where Lidar and camera are not time synchronized, which is practically important for actual driving environments. RoarNet is implemented in Tensorflow and publicly available with pre-trained models.

* 7 pages, 8 figures, 2 tables 
Access Paper or Ask Questions

Hybrid Optimized Deep Convolution Neural Network based Learning Model for Object Detection

Mar 02, 2022
Venkata Beri

Object identification is one of the most fundamental and difficult issues in computer vision. It aims to discover object instances in real pictures from a huge number of established categories. In recent years, deep learning-based object detection techniques that developed from computer vision have grabbed the public's interest. Object recognition methods based on deep learning frameworks have quickly become a popular way to interpret moving images acquired by various sensors. Due to its vast variety of applications for various computer vision tasks such as activity or event detection, content-based image retrieval, and scene understanding, academics have spent decades attempting to solve this problem. With this goal in mind, a unique deep learning classification technique is used to create an autonomous object detecting system. The noise destruction and normalising operations, which are carried out using gaussian filter and contrast normalisation techniques, respectively, are the first steps in the study activity. The pre-processed picture is next subjected to entropy-based segmentation algorithms, which separate the image's significant areas in order to distinguish between distinct occurrences. The classification challenge is completed by the suggested Hybrid Optimized Dense Convolutional Neural Network (HODCNN). The major goal of this framework is to aid in the precise recognition of distinct items from the gathered input frames. The suggested system's performance is assessed by comparing it to existing machine learning and deep learning methodologies. The experimental findings reveal that the suggested framework has a detection accuracy of 0.9864, which is greater than current techniques. As a result, the suggested object detection model outperforms other current methods.

* 23 Pages, 7 Figures 
Access Paper or Ask Questions

Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving

Jun 14, 2019
Yurong You, Yan Wang, Wei-Lun Chao, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

Detecting objects such as cars and pedestrians in 3D plays an indispensable role in autonomous driving. Existing approaches largely rely on expensive LiDAR sensors for accurate depth information. While recently pseudo-LiDAR has been introduced as a promising alternative, at a much lower cost based solely on stereo images, there is still a notable performance gap. In this paper we provide substantial advances to the pseudo-LiDAR framework through improvements in stereo depth estimation. Concretely, we adapt the stereo network architecture and loss function to be more aligned with accurate depth estimation of far away objects (currently the primary weakness of pseudo-LiDAR). Further, we explore the idea to leverage cheaper but extremely sparse LiDAR sensors, which alone provide insufficient information for 3D detection, to de-bias our depth estimation. We propose a depth-propagation algorithm, guided by the initial depth estimates, to diffuse these few exact measurements across the entire depth map. We show on the KITTI object detection benchmark that our combined approach yields substantial improvements in depth estimation and stereo-based 3D object detection --- outperforming the previous state-of-the-art detection accuracy for far-away objects by 40%. Our code will be publicly available at

Access Paper or Ask Questions