Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

KOLOMVERSE: KRISO open large-scale image dataset for object detection in the maritime universe

Jun 20, 2022
Abhilasha Nanda, Sung Won Cho, Hyeopwoo Lee, Jin Hyoung Park

Over the years, datasets have been developed for various object detection tasks. Object detection in the maritime domain is essential for the safety and navigation of ships. However, there is still a lack of publicly available large-scale datasets in the maritime domain. To overcome this challenge, we present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain by KRISO (Korea Research Institute of Ships and Ocean Engineering). We collected 5,845 hours of video data captured from 21 territorial waters of South Korea. Through an elaborate data quality assessment process, we gathered around 2,151,470 4K resolution images from the video data. This dataset considers various environments: weather, time, illumination, occlusion, viewpoint, background, wind speed, and visibility. The KOLOMVERSE consists of five classes (ship, buoy, fishnet buoy, lighthouse and wind farm) for maritime object detection. The dataset has images of 3840$\times$2160 pixels and to our knowledge, it is by far the largest publicly available dataset for object detection in the maritime domain. We performed object detection experiments and evaluated our dataset on several pre-trained state-of-the-art architectures to show the effectiveness and usefulness of our dataset. The dataset is available at: \url{}.

* 13 Pages, 12 figures, submitted to NeurIPS 2022 Datasets and Benchmarks Track (Under Review) 

Augmentation for small object detection

Feb 19, 2019
Mate Kisantal, Zbigniew Wojna, Jakub Murawski, Jacek Naruniec, Kyunghyun Cho

In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.


Towards PAC Multi-Object Detection and Tracking

Apr 15, 2022
Shuo Li, Sangdon Park, Xiayan Ji, Insup Lee, Osbert Bastani

Accurately detecting and tracking multi-objects is important for safety-critical applications such as autonomous navigation. However, it remains challenging to provide guarantees on the performance of state-of-the-art techniques based on deep learning. We consider a strategy known as conformal prediction, which predicts sets of labels instead of a single label; in the classification and regression settings, these algorithms can guarantee that the true label lies within the prediction set with high probability. Building on these ideas, we propose multi-object detection and tracking algorithms that come with probably approximately correct (PAC) guarantees. They do so by constructing both a prediction set around each object detection as well as around the set of edge transitions; given an object, the detection prediction set contains its true bounding box with high probability, and the edge prediction set contains its true transition across frames with high probability. We empirically demonstrate that our method can detect and track objects with PAC guarantees on the COCO and MOT-17 datasets.

* 15 pages, 4 figures, 2 tables 

Efficient Decoder-free Object Detection with Transformers

Jun 17, 2022
Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

Vision transformers (ViTs) are changing the landscape of object detection approaches. A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference. More subtle usage is the DETR family, which eliminates the need for many hand-designed components in object detection but introduces a decoder demanding an extra-long time to converge. As a result, transformer-based object detection can not prevail in large-scale applications. To overcome these issues, we propose a novel decoder-free fully transformer-based (DFFT) object detector, achieving high efficiency in both training and inference stages, for the first time. We simplify objection detection into an encoder-only single-level anchor-based dense prediction problem by centering around two entry points: 1) Eliminate the training-inefficient decoder and leverage two strong encoders to preserve the accuracy of single-level feature map prediction; 2) Explore low-level semantic features for the detection task with limited computational resources. In particular, we design a novel lightweight detection-oriented transformer backbone that efficiently captures low-level features with rich semantics based on a well-conceived ablation study. Extensive experiments on the MS COCO benchmark demonstrate that DFFT_SMALL outperforms DETR by 2.5% AP with 28% computation cost reduction and more than $10$x fewer training epochs. Compared with the cutting-edge anchor-based detector RetinaNet, DFFT_SMALL obtains over 5.5% AP gain while cutting down 70% computation cost.

* Update metadata, 10 pages 

3D-FCT: Simultaneous 3D Object Detection and Tracking Using Feature Correlation

Oct 06, 2021
Naman Sharma, Hocksoon Lim

3D object detection using LiDAR data remains a key task for applications like autonomous driving and robotics. Unlike in the case of 2D images, LiDAR data is almost always collected over a period of time. However, most work in this area has focused on performing detection independent of the temporal domain. In this paper we present 3D-FCT, a Siamese network architecture that utilizes temporal information to simultaneously perform the related tasks of 3D object detection and tracking. The network is trained to predict the movement of an object based on the correlation features of extracted keypoints across time. Calculating correlation across keypoints only allows for real-time object detection. We further extend the multi-task objective to include a tracking regression loss. Finally, we produce high accuracy detections by linking short-term object tracklets into long term tracks based on the predicted tracks. Our proposed method is evaluated on the KITTI tracking dataset where it is shown to provide an improvement of 5.57% mAP over a state-of-the-art approach.


Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds

Dec 15, 2021
Junho Koh, Jaekyum Kim, Jinhyuk Yoo, Yecheol Kim, Dongsuk Kum, Jun Won Choi

In this paper, we propose a new joint object detection and tracking (JoDT) framework for 3D object detection and tracking based on camera and LiDAR sensors. The proposed method, referred to as 3D DetecTrack, enables the detector and tracker to cooperate to generate a spatio-temporal representation of the camera and LiDAR data, with which 3D object detection and tracking are then performed. The detector constructs the spatio-temporal features via the weighted temporal aggregation of the spatial features obtained by the camera and LiDAR fusion. Then, the detector reconfigures the initial detection results using information from the tracklets maintained up to the previous time step. Based on the spatio-temporal features generated by the detector, the tracker associates the detected objects with previously tracked objects using a graph neural network (GNN). We devise a fully-connected GNN facilitated by a combination of rule-based edge pruning and attention-based edge gating, which exploits both spatial and temporal object contexts to improve tracking performance. The experiments conducted on both KITTI and nuScenes benchmarks demonstrate that the proposed 3D DetecTrack achieves significant improvements in both detection and tracking performances over baseline methods and achieves state-of-the-art performance among existing methods through collaboration between the detector and tracker.


Max-Margin Object Detection

Jan 31, 2015
Davis E. King

Most object detection methods operate by applying a binary classifier to sub-windows of an image, followed by a non-maximum suppression step where detections on overlapping sub-windows are removed. Since the number of possible sub-windows in even moderately sized image datasets is extremely large, the classifier is typically learned from only a subset of the windows. This avoids the computational difficulty of dealing with the entire set of sub-windows, however, as we will show in this paper, it leads to sub-optimal detector performance. In particular, the main contribution of this paper is the introduction of a new method, Max-Margin Object Detection (MMOD), for learning to detect objects in images. This method does not perform any sub-sampling, but instead optimizes over all sub-windows. MMOD can be used to improve any object detection method which is linear in the learned parameters, such as HOG or bag-of-visual-word models. Using this approach we show substantial performance gains on three publicly available datasets. Strikingly, we show that a single rigid HOG filter can outperform a state-of-the-art deformable part model on the Face Detection Data Set and Benchmark when the HOG filter is learned via MMOD.


UOLO - automatic object detection and segmentation in biomedical images

Oct 09, 2018
Teresa Araújo, Guilherme Aresta, Adrian Galdran, Pedro Costa, Ana Maria Mendonça, Aurélio Campilho

We propose UOLO, a novel framework for the simultaneous detection and segmentation of structures of interest in medical images. UOLO consists of an object segmentation module which intermediate abstract representations are processed and used as input for object detection. The resulting system is optimized simultaneously for detecting a class of objects and segmenting an optionally different class of structures. UOLO is trained on a set of bounding boxes enclosing the objects to detect, as well as pixel-wise segmentation information, when available. A new loss function is devised, taking into account whether a reference segmentation is accessible for each training image, in order to suitably backpropagate the error. We validate UOLO on the task of simultaneous optic disc (OD) detection, fovea detection, and OD segmentation from retinal images, achieving state-of-the-art performance on public datasets.

* 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings. 165-173 
* Publised on DLMIA 2018. Licensed under the Creative Commons CC-BY-NC-ND 4.0 license: 

Multi-patch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images

Aug 18, 2021
Pourya Shamsolmoali, Jocelyn Chanussot, Masoumeh Zareapoor, Huiyu Zhou, Jie Yang

Object detection is a challenging task in remote sensing because objects only occupy a few pixels in the images, and the models are required to simultaneously learn object locations and detection. Even though the established approaches well perform for the objects of regular sizes, they achieve weak performance when analyzing small ones or getting stuck in the local minima (e.g. false object parts). Two possible issues stand in their way. First, the existing methods struggle to perform stably on the detection of small objects because of the complicated background. Second, most of the standard methods used hand-crafted features, and do not work well on the detection of objects parts of which are missing. We here address the above issues and propose a new architecture with a multiple patch feature pyramid network (MPFP-Net). Different from the current models that during training only pursue the most discriminative patches, in MPFPNet the patches are divided into class-affiliated subsets, in which the patches are related and based on the primary loss function, a sequence of smooth loss functions are determined for the subsets to improve the model for collecting small object parts. To enhance the feature representation for patch selection, we introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving. The network contains bottom-up and crosswise connections to fuse the features of different scales to achieve better accuracy, compared to several state-of-the-art object detection models. Also, the developed architecture is more efficient than the baselines.