Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Object Detection": models, code, and papers

Concealed Object Detection

Feb 20, 2021
Deng-Ping Fan, Ge-Peng Ji, Ming-Ming Cheng, Ling Shao

We present the first systematic study on concealed object detection (COD), which aims to identify objects that are "perfectly" embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K is the largest COD dataset to date, with the richest annotations, which enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification, etc. Motivated by how animals hunt in the wild, we also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperforms 12 cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available on our project page: http://mmcheng.net/cod.

* 17 pages, 27 figures, Code: https://github.com/GewelsJI/SINet-V2 
  

Class-Aware Robust Adversarial Training for Object Detection

Mar 31, 2021
Pin-Chun Chen, Bo-Han Kung, Jun-Cheng Chen

Object detection is an important computer vision task with plenty of real-world applications; therefore, how to enhance its robustness against adversarial attacks has emerged as a crucial issue. However, most of the previous defense methods focused on the classification task and had few analysis in the context of the object detection task. In this work, to address the issue, we present a novel class-aware robust adversarial training paradigm for the object detection task. For a given image, the proposed approach generates an universal adversarial perturbation to simultaneously attack all the occurred objects in the image through jointly maximizing the respective loss for each object. Meanwhile, instead of normalizing the total loss with the number of objects, the proposed approach decomposes the total loss into class-wise losses and normalizes each class loss using the number of objects for the class. The adversarial training based on the class weighted loss can not only balances the influence of each class but also effectively and evenly improves the adversarial robustness of trained models for all the object classes as compared with the previous defense methods. Furthermore, with the recent development of fast adversarial training, we provide a fast version of the proposed algorithm which can be trained faster than the traditional adversarial training while keeping comparable performance. With extensive experiments on the challenging PASCAL-VOC and MS-COCO datasets, the evaluation results demonstrate that the proposed defense methods can effectively enhance the robustness of the object detection models.

  

Non-imaging real-time detection and tracking of fast-moving objects

Aug 13, 2021
Fengming Zhou, Xuelei Shi, Jie Chen, Tianhang Tang, Yiguang Liu

Real-time detection and tracking of fast-moving objects have achieved great success in various fields. However, many existing methods, especially low-cost ones, are difficult to achieve real-time and long-term object detection and tracking. Here, a non-imaging strategy is proposed, including two stages, to realize fast-moving object detection and tracking in real-time and for the long term: 1) a contour-moments-based method is proposed to optimize the Hadamard pattern sequence. And then reconstructing projection curves of the object based on single-pixel imaging technology. The projection curve, which including the object location information, is reconstructed directly with the measurements collected by a single-pixel detector; 2) The fastest changing position in the projection curve can be obtained by solving first-order gradients. A gradient differential is used in two first-order gradients to calculate a differential curve with the sudden change positions. Finally, we can obtain the boundary information of the fast-moving object. We experimentally demonstrate that our approach can achieve a temporal resolution of 105 frames per second at a 1.28% sampling rate by using a 22,000 Hz digital micro-mirror device. The detection and tracking algorithm of the proposed strategy is computationally efficient. Compared with the state-of-the-art methods, our approach can make the sampling rate lower. Additionally, the strategy acquires not more than 1MB of data for each frame, which is capable of fast-moving object real-time and long-term detection and tracking.

  

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search

Jun 28, 2021
Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Bin Ren, Yanzhi Wang, Xue Lin

Object detection plays an important role in self-driving cars for security development. However, mobile systems on self-driving cars with limited computation resources lead to difficulties for object detection. To facilitate this, we propose a compiler-aware neural pruning search framework to achieve high-speed inference on autonomous vehicles for 2D and 3D object detection. The framework automatically searches the pruning scheme and rate for each layer to find a best-suited pruning for optimizing detection accuracy and speed performance under compiler optimization. Our experiments demonstrate that for the first time, the proposed method achieves (close-to) real-time, 55ms and 99ms inference times for YOLOv4 based 2D object detection and PointPillars based 3D detection, respectively, on an off-the-shelf mobile phone with minor (or no) accuracy loss.

* Presented on the HiPEAC 2021 workshop (cogarch 2021) 
  

Semi-supervised Learning for Dense Object Detection in Retail Scenes

Jul 05, 2021
Jaydeep Chauhan, Srikrishna Varadarajan, Muktabh Mayank Srivastava

Retail scenes usually contain densely packed high number of objects in each image. Standard object detection techniques use fully supervised training methodology. This is highly costly as annotating a large dense retail object detection dataset involves an order of magnitude more effort compared to standard datasets. Hence, we propose semi-supervised learning to effectively use the large amount of unlabeled data available in the retail domain. We adapt a popular self supervised method called noisy student initially proposed for object classification to the task of dense object detection. We show that using unlabeled data with the noisy student training methodology, we can improve the state of the art on precise detection of objects in densely packed retail scenes. We also show that performance of the model increases as you increase the amount of unlabeled data.

  

ODDObjects: A Framework for Multiclass Unsupervised Anomaly Detection on Masked Objects

Apr 26, 2021
Ricky Ma

This paper presents a novel framework for unsupervised anomaly detection on masked objects called ODDObjects, which stands for Out-of-Distribution Detection on Objects. ODDObjects is designed to detect anomalies of various categories using unsupervised autoencoders trained on COCO-style datasets. The method utilizes autoencoder-based image reconstruction, where high reconstruction error indicates the possibility of an anomaly. The framework extends previous work on anomaly detection with autoencoders, comparing state-of-the-art models trained on object recognition datasets. Various model architectures were compared, and experimental results show that memory-augmented deep convolutional autoencoders perform the best at detecting out-of-distribution objects.

* 11 pages, 15 Postscript figures 
  

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

Apr 29, 2021
Jiachen Li, Bowen Cheng, Rogerio Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi

Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric. In this paper, we present \textbf{Pseudo-Intersection-over-Union~(Pseudo-IoU)}: a simple metric that brings more standardized and accurate assignment rule into anchor-free object detection frameworks without any additional computational cost or extra parameters for training and testing, making it possible to further improve anchor-free object detection by utilizing training samples of good quality under effective assignment rules that have been previously applied in anchor-based methods. By incorporating Pseudo-IoU metric into an end-to-end single-stage anchor-free object detection framework, we observe consistent improvements in their performance on general object detection benchmarks such as PASCAL VOC and MSCOCO. Our method (single-model and single-scale) also achieves comparable performance to other recent state-of-the-art anchor-free methods without bells and whistles. Our code is based on mmdetection toolbox and will be made publicly available at https://github.com/SHI-Labs/Pseudo-IoU-for-Anchor-Free-Object-Detection.

* CVPR 2021 Workshop 
  

PolarFormer: Multi-camera 3D Object Detection with Polar Transformers

Jul 12, 2022
Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang

3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world. Following the conventional wisdom of previous 2D object detection, existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis. However, we conjugate that this does not fit the nature of the ego car's perspective, as each onboard camera perceives the world in shape of wedge intrinsic to the imaging geometry with radical (non-perpendicular) axis. Hence, in this paper we advocate the exploitation of the Polar coordinate system and propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images. Specifically, we design a cross attention based Polar detection head without restriction to the shape of input structure to deal with irregular Polar grids. For tackling the unconstrained object scale variations along Polar's distance dimension, we further introduce a multi-scalePolar representation learning strategy. As a result, our model can make best use of the Polar representation rasterized via attending to the corresponding image observation in a sequence-to-sequence fashion subject to the geometric constraints. Thorough experiments on the nuScenes dataset demonstrate that our PolarFormer outperforms significantly state-of-the-art 3D object detection alternatives, as well as yielding competitive performance on BEV semantic segmentation task.

  

PolarFormer: Multi-camera 3D Object Detection with Polar Transformer

Jun 30, 2022
Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang

3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world. Following the conventional wisdom of previous 2D object detection, existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis. However, we conjugate that this does not fit the nature of the ego car's perspective, as each onboard camera perceives the world in shape of wedge intrinsic to the imaging geometry with radical (non-perpendicular) axis. Hence, in this paper we advocate the exploitation of the Polar coordinate system and propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images. Specifically, we design a cross attention based Polar detection head without restriction to the shape of input structure to deal with irregular Polar grids. For tackling the unconstrained object scale variations along Polar's distance dimension, we further introduce a multi-scalePolar representation learning strategy. As a result, our model can make best use of the Polar representation rasterized via attending to the corresponding image observation in a sequence-to-sequence fashion subject to the geometric constraints. Thorough experiments on the nuScenes dataset demonstrate that our PolarFormer outperforms significantly state-of-the-art 3D object detection alternatives, as well as yielding competitive performance on BEV semantic segmentation task.

  
<<
35
36
37
38
39
40
41
42
43
44
45
46
47
>>