Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Towards Object Detection from Motion

Sep 17, 2019
Rico Jonschkowski, Austin Stone

We present a novel approach to weakly supervised object detection. Instead of annotated images, our method only requires two short videos to learn to detect a new object: 1) a video of a moving object and 2) one or more "negative" videos of the scene without the object. The key idea of our algorithm is to train the object detector to produce physically plausible object motion when applied to the first video and to not detect anything in the second video. With this approach, our method learns to locate objects without any object location annotations. Once the model is trained, it performs object detection on single images. We evaluate our method in three robotics settings that afford learning objects from motion: observing moving objects, watching demonstrations of object manipulation, and physically interacting with objects (see a video summary at


Mixed Supervised Object Detection with Robust Objectness Transfer

Mar 13, 2018
Yan Li, Junge Zhang, Kaiqi Huang, Jianguo Zhang

In this paper, we consider the problem of leveraging existing fully labeled categories to improve the weakly supervised detection (WSD) of new object categories, which we refer to as mixed supervised detection (MSD). Different from previous MSD methods that directly transfer the pre-trained object detectors from existing categories to new categories, we propose a more reasonable and robust objectness transfer approach for MSD. In our framework, we first learn domain-invariant objectness knowledge from the existing fully labeled categories. The knowledge is modeled based on invariant features that are robust to the distribution discrepancy between the existing categories and new categories; therefore the resulting knowledge would generalize well to new categories and could assist detection models to reject distractors (e.g., object parts) in weakly labeled images of new categories. Under the guidance of learned objectness knowledge, we utilize multiple instance learning (MIL) to model the concepts of both objects and distractors and to further improve the ability of rejecting distractors in weakly labeled images. Our robust objectness transfer approach outperforms the existing MSD methods, and achieves state-of-the-art results on the challenging ILSVRC2013 detection dataset and the PASCAL VOC datasets.

* IEEE Transactions on Pattern Analysis and Machine Intelligence (2018). Together with Supplementary Materials 

Object Detection by Spatio-Temporal Analysis and Tracking of the Detected Objects in a Video with Variable Background

Apr 28, 2017
Kumar S. Ray, Vijayan K. Asari, Soma Chakraborty

In this paper we propose a novel approach for detecting and tracking objects in videos with variable background i.e. videos captured by moving cameras without any additional sensor. In a video captured by a moving camera, both the background and foreground are changing in each frame of the image sequence. So for these videos, modeling a single background with traditional background modeling methods is infeasible and thus the detection of actual moving object in a variable background is a challenging task. To detect actual moving object in this work, spatio-temporal blobs have been generated in each frame by spatio-temporal analysis of the image sequence using a three-dimensional Gabor filter. Then individual blobs, which are parts of one object are merged using Minimum Spanning Tree to form the moving object in the variable background. The height, width and four-bin gray-value histogram of the object are calculated as its features and an object is tracked in each frame using these features to generate the trajectories of the object through the video sequence. In this work, problem of data association during tracking is solved by Linear Assignment Problem and occlusion is handled by the application of kalman filter. The major advantage of our method over most of the existing tracking algorithms is that, the proposed method does not require initialization in the first frame or training on sample data to perform. Performance of the algorithm has been tested on benchmark videos and very satisfactory result has been achieved. The performance of the algorithm is also comparable and superior with respect to some benchmark algorithms.


Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild

Jul 23, 2020
Yang Xiao, Renaud Marlet

Detecting objects and estimating their viewpoint in images are key tasks of 3D scene understanding. Recent approaches have achieved excellent results on very large benchmarks for object detection and viewpoint estimation. However, performances are still lagging behind for novel object categories with few samples. In this paper, we tackle the problems of few-shot object detection and few-shot viewpoint estimation. We propose a meta-learning framework that can be applied to both tasks, possibly including 3D data. Our models improve the results on objects of novel classes by leveraging on rich feature information originating from base classes with many samples. A simple joint feature embedding module is proposed to make the most of this feature sharing. Despite its simplicity, our method outperforms state-of-the-art methods by a large margin on a range of datasets, including PASCAL VOC and MS COCO for few-shot object detection, and Pascal3D+ and ObjectNet3D for few-shot viewpoint estimation. And for the first time, we tackle the combination of both few-shot tasks, on Object- Net3D, showing promising results. Our code and data are available at

* Accepted as Poster at ECCV 2020, project website: 

Object Detection in Videos by High Quality Object Linking

Jun 10, 2018
Peng Tang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wenjun Zeng, Jingdong Wang

Compared with object detection in static images, object detection in videos is more challenging due to degraded image qualities. An effective way to address this problem is to exploit temporal contexts by linking the same object across video to form tubelets and aggregating classification scores in the tubelets. In this paper, we focus on obtaining high quality object linking results for better classification. Unlike previous methods that link objects by checking boxes between neighboring frames, we propose to link in the same frame. To achieve this goal, we extend prior methods in following aspects: (1) a cuboid proposal network that extracts spatio-temporal candidate cuboids which bound the movement of objects; (2) a short tubelet detection network that detects short tubelets in short video segments; (3) a short tubelet linking algorithm that links temporally-overlapping short tubelets to form long tubelets. Experiments on the ImageNet VID dataset show that our method outperforms both the static image detector and the previous state of the art. In particular, our method improves results by 8.8% over the static image detector for fast moving objects.


Slender Object Detection: Diagnoses and Improvements

Nov 21, 2020
Zhaoyi Wan, Yimin Chen, Sutao Deng, Cong Yao, Jiebo Luo

In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely slender objects. In real-world scenarios as well as widely-used datasets (such as COCO), slender objects are actually very common. However, this type of object has been largely overlooked by previous object detection algorithms. Upon our investigation, for a classical object detection method, a drastic drop of 18.9% mAP on COCO is observed, if solely evaluated on slender objects. Therefore, We systematically study the problem of slender object detection in this work. Accordingly, an analytical framework with carefully designed benchmark and evaluation protocols is established, in which different algorithms and modules can be inspected and compared. Our key findings include: 1) the essential role of anchors in label assignment; 2) the descriptive capability of the 2-point representation; 3) the crucial strategies for improving the detection of slender objects and regular objects. Our work identifies and extends the insights of existing methods that are previously underexploited. Furthermore, we propose a feature adaption strategy that achieves clear and consistent improvements over current representative object detection methods. In particular, a natural and effective extension of the center prior, which leads to a significant improvement on slender objects, is devised. We believe this work opens up new opportunities and calibrates ablation standards for future research in the field of object detection.


TrackNet: Simultaneous Object Detection and Tracking and Its Application in Traffic Video Analysis

Feb 04, 2019
Chenge Li, Gregory Dobler, Xin Feng, Yao Wang

Object detection and object tracking are usually treated as two separate processes. Significant progress has been made for object detection in 2D images using deep learning networks. The usual tracking-by-detection pipeline for object tracking requires that the object is successfully detected in the first frame and all subsequent frames, and tracking is done by associating detection results. Performing object detection and object tracking through a single network remains a challenging open question. We propose a novel network structure named trackNet that can directly detect a 3D tube enclosing a moving object in a video segment by extending the faster R-CNN framework. A Tube Proposal Network (TPN) inside the trackNet is proposed to predict the objectness of each candidate tube and location parameters specifying the bounding tube. The proposed framework is applicable for detecting and tracking any object and in this paper, we focus on its application for traffic video analysis. The proposed model is trained and tested on UA-DETRAC, a large traffic video dataset available for multi-vehicle detection and tracking, and obtained very promising results.


Plug & Play Convolutional Regression Tracker for Video Object Detection

Mar 02, 2020
Ye Lyu, Michael Ying Yang, George Vosselman, Gui-Song Xia

Video object detection targets to simultaneously localize the bounding boxes of the objects and identify their classes in a given video. One challenge for video object detection is to consistently detect all objects across the whole video. As the appearance of objects may deteriorate in some frames, features or detections from the other frames are commonly used to enhance the prediction. In this paper, we propose a Plug & Play scale-adaptive convolutional regression tracker for the video object detection task, which could be easily and compatibly implanted into the current state-of-the-art detection networks. As the tracker reuses the features from the detector, it is a very light-weighted increment to the detection network. The whole network performs at the speed close to a standard object detector. With our new video object detection pipeline design, image object detectors can be easily turned into efficient video object detectors without modifying any parameters. The performance is evaluated on the large-scale ImageNet VID dataset. Our Plug & Play design improves mAP score for the image detector by around 5% with only little speed drop.


TricubeNet: 2D Kernel-Based Object Representation for Weakly-Occluded Oriented Object Detection

Apr 23, 2021
Beomyoung Kim, Janghyeon Lee, Sihaeng Lee, Doyeon Kim, Junmo Kim

We present a new approach for oriented object detection, an anchor-free one-stage detector. This approach, named TricubeNet, represents each object as a 2D Tricube kernel and extracts bounding boxes using appearance-based post-processing. Unlike existing anchor-based oriented object detectors, we can save the computational complexity and the number of hyperparameters by eliminating the anchor box in the network design. In addition, by adopting a heatmap-based detection process instead of the box offset regression, we simply and effectively solve the angle discontinuity problem, which is one of the important problems for oriented object detection. To further boost the performance, we propose some effective techniques for the loss balancing, extracting the rotation-invariant feature, and heatmap refinement. To demonstrate the effectiveness of our TricueNet, we experiment on various tasks for the weakly-occluded oriented object detection. The extensive experimental results show that our TricueNet is highly effective and competitive for oriented object detection. The code is available at


Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Aug 06, 2019
Qi Fan, Wei Zhuo, Yu-Wing Tai

Conventional methods for object detection usually requires substantial amount of training data and to prepare such high quality training data is labor intensive. In this paper, we propose few-shot object detection which aims to detect objects of unseen class with a few training examples. Central to our method is the Attention-RPN and the multi-relation module which fully exploit the similarity between the few shot training examples and the test set to detect novel objects while suppressing the false detection in background. To train our network, we have prepared a new dataset which contains 1000 categories of varies objects with high quality annotations. To the best of our knowledge, this is also the first dataset specifically designed for few shot object detection. Once our network is trained, we can apply object detection for unseen classes without further training or fine tuning. This is also the major advantage of few shot object detection. Our method is general, and has a wide range of applications. We demonstrate the effectiveness of our method quantitatively and qualitatively on different datasets. The dataset link is: