Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Speedy Object Detection based on Shape

Jul 12, 2013
Y. Jayanta Singh, Shalu Gupta

This study is a part of design of an audio system for in-house object detection system for visually impaired, low vision personnel by birth or by an accident or due to old age. The input of the system will be scene and output as audio. Alert facility is provided based on severity levels of the objects (snake, broke glass etc) and also during difficulties. The study proposed techniques to provide speedy detection of objects based on shapes and its scale. Features are extraction to have minimum spaces using dynamic scaling. From a scene, clusters of objects are formed based on the scale and shape. Searching is performed among the clusters initially based on the shape, scale, mean cluster value and index of object(s). The minimum operation to detect the possible shape of the object is performed. In case the object does not have a likely matching shape, scale etc, then the several operations required for an object detection will not perform; instead, it will declared as a new object. In such way, this study finds a speedy way of detecting objects.

* The International Journal of Multimedia & Its Applications (IJMA) Vol.5, No.3, June 2013 
* arXiv admin note: text overlap with arXiv:1210.7038 by other authors 

Few-shot Object Detection via Feature Reweighting

Dec 05, 2018
Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell

This work aims to solve the challenging few-shot object detection problem where only a few annotated examples are available for each object category to train a detection model. Such an ability of learning to detect an object from just a few examples is common for human vision systems, but remains absent for computer vision systems. Though few-shot meta learning offers a promising solution technique, previous works mostly target the task of image classification and are not directly applicable for the much more complicated object detection task. In this work, we propose a novel meta-learning based model with carefully designed architecture, which consists of a meta-model and a base detection model. The base detection model is trained on several base classes with sufficient samples to offer basis features. The meta-model is trained to reweight importance of features from the base detection model over the input image and adapt these features to assist novel object detection from a few examples. The meta-model is light-weight, end-to-end trainable and able to entail the base model with detection ability for novel objects fast. Through experiments we demonstrated our model can outperform baselines by a large margin for few-shot object detection, on multiple datasets and settings. Our model also exhibits fast adaptation speed to novel few-shot classes.


Open-Vocabulary Object Detection Using Captions

Nov 20, 2020
Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang

Despite the remarkable accuracy of deep neural networks in object detection, they are costly to train and scale due to supervision requirements. Particularly, learning more object categories typically requires proportionally more bounding box annotations. Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models. In this paper, we put forth a novel formulation of the object detection problem, namely open-vocabulary object detection, which is more general, more practical, and more effective than weakly supervised and zero-shot approaches. We propose a new method to train object detectors using bounding box annotations for a limited set of object categories, as well as image-caption pairs that cover a larger variety of objects at a significantly lower cost. We show that the proposed method can detect and localize objects for which no bounding box annotation is provided during training, at a significantly higher accuracy than zero-shot approaches. Meanwhile, objects with bounding box annotation can be detected almost as accurately as supervised methods, which is significantly better than weakly supervised baselines. Accordingly, we establish a new state of the art for scalable object detection.


Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition

Jun 14, 2014
Tim Althoff, Hyun Oh Song, Trevor Darrell

While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition. Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence. In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections. This representation is extended to video by aggregating the key frame level image representations through mean and max pooling. We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank. These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data.

* ACM Multimedia 2012 

Depth-Guided Camouflaged Object Detection

Jun 26, 2021
Jing Zhang, Yunqiu Lv, Mochu Xiang, Aixuan Li, Yuchao Dai, Yiran Zhong

Camouflaged object detection (COD) aims to segment camouflaged objects hiding in the environment, which is challenging due to the similar appearance of camouflaged objects and their surroundings. Research in biology suggests that depth can provide useful object localization cues for camouflaged object discovery, as all the animals have 3D perception ability. However, the depth information has not been exploited for camouflaged object detection. To explore the contribution of depth for camouflage detection, we present a depth-guided camouflaged object detection network with pre-computed depth maps from existing monocular depth estimation methods. Due to the domain gap between the depth estimation dataset and our camouflaged object detection dataset, the generated depth may not be accurate enough to be directly used in our framework. We then introduce a depth quality assessment module to evaluate the quality of depth based on the model prediction from both RGB COD branch and RGB-D COD branch. During training, only high-quality depth is used to update the modal interaction module for multi-modal learning. During testing, our depth quality assessment module can effectively determine the contribution of depth and select the RGB branch or RGB-D branch for camouflage prediction. Extensive experiments on various camouflaged object detection datasets prove the effectiveness of our solution in exploring the depth information for camouflaged object detection. Our code and data is publicly available at: \url{}.

* 10 pages main content + 3 pages reference. The first work in RGB-D Camouflaged object detection (COD) 

A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

Feb 17, 2021
Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Object frequencies in daily scenes follow a long-tailed distribution. Many objects do not appear frequently enough in scene-centric images (e.g., sightseeing, street views) for us to train accurate object detectors. In contrast, these objects are captured at a higher frequency in object-centric images, which are intended to picture the objects of interest. Motivated by this phenomenon, we propose to take advantage of the object-centric images to improve object detection in scene-centric images. We present a simple yet surprisingly effective framework to do so. On the one hand, our approach turns an object-centric image into a useful training example for object detection in scene-centric images by mitigating the domain gap between the two image sources in both the input and label space. On the other hand, our approach employs a multi-stage procedure to train the object detector, such that the detector learns the diverse object appearances from object-centric images while being tied to the application domain of scene-centric images. On the LVIS dataset, our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively, without sacrificing the performance of other classes.


Comprehensive Analysis of the Object Detection Pipeline on UAVs

Mar 01, 2022
Leon Amadeus Varga, Sebastian Koch, Andreas Zell

An object detection pipeline comprises a camera that captures the scene and an object detector that processes these images. The quality of the images directly affects the performance of the object detector. Many works nowadays focus either on improving the image quality or improving the object detection models independently, but neglect the importance of joint optimization of the two subsystems. In this paper, we first empirically analyze the influence of seven parameters (quantization, compression, resolution, color model, image distortion, gamma correction, additional channels) in remote sensing applications. For our experiments, we utilize three UAV data sets from different domains and a mixture of large and small state-of-the-art object detector models to provide an extensive evaluation of the influence of the pipeline parameters. Additionally, we realize an object detection pipeline prototype on an embedded platform for an UAV and give a best practice recommendation for building object detection pipelines based on our findings. We show that not all parameters have an equal impact on detection accuracy and data throughput, and that by using a suitable compromise between parameters we are able to improve detection accuracy for lightweight object detection models, while keeping the same data throughput.

* Submitted IROS22 

Object Detection with Deep Learning: A Review

Jul 15, 2018
Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, Xindong Wu

Due to object detection's close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.


Multi-Channel CNN-based Object Detection for Enhanced Situation Awareness

Nov 30, 2017
Shuo Liu, Zheng Liu

Object Detection is critical for automatic military operations. However, the performance of current object detection algorithms is deficient in terms of the requirements in military scenarios. This is mainly because the object presence is hard to detect due to the indistinguishable appearance and dramatic changes of object's size which is determined by the distance to the detection sensors. Recent advances in deep learning have achieved promising results in many challenging tasks. The state-of-the-art in object detection is represented by convolutional neural networks (CNNs), such as the fast R-CNN algorithm. These CNN-based methods improve the detection performance significantly on several public generic object detection datasets. However, their performance on detecting small objects or undistinguishable objects in visible spectrum images is still insufficient. In this study, we propose a novel detection algorithm for military objects by fusing multi-channel CNNs. We combine spatial, temporal and thermal information by generating a three-channel image, and they will be fused as CNN feature maps in an unsupervised manner. The backbone of our object detection framework is from the fast R-CNN algorithm, and we utilize cross-domain transfer learning technique to fine-tune the CNN model on generated multi-channel images. In the experiments, we validated the proposed method with the images from SENSIAC (Military Sensing Information Analysis Centre) database and compared it with the state-of-the-art. The experimental results demonstrated the effectiveness of the proposed method on both accuracy and computational efficiency.

* Published at the Sensors & Electronics Technology (SET) panel Symposium SET-241 on 9th NATO Military Sensing Symposium 

SWA Object Detection

Dec 25, 2020
Haoyang Zhang, Ying Wang, Feras Dayoub, Niko Sünderhauf

Do you want to improve 1.0 AP for your object detector without any inference cost and any change to your detector? Let us tell you such a recipe. It is surprisingly simple: train your detector for an extra 12 epochs using cyclical learning rates and then average these 12 checkpoints as your final detection model. This potent recipe is inspired by Stochastic Weights Averaging (SWA), which is proposed in arXiv:1803.05407 for improving generalization in deep neural networks. We found it also very effective in object detection. In this technique report, we systematically investigate the effects of applying SWA to object detection as well as instance segmentation. Through extensive experiments, we discover a good policy of performing SWA in object detection, and we consistently achieve $\sim$1.0 AP improvement over various popular detectors on the challenging COCO benchmark. We hope this work will make more researchers in object detection know this technique and help them train better object detectors. Code is available at: .

* 9 pages; add results