Alert button
Picture for Shiguo Lian

Shiguo Lian

Alert button

Patch-wise Auto-Encoder for Visual Anomaly Detection

Aug 01, 2023
Yajie Cui, Zhaoxiang Liu, Shiguo Lian

Figure 1 for Patch-wise Auto-Encoder for Visual Anomaly Detection
Figure 2 for Patch-wise Auto-Encoder for Visual Anomaly Detection
Figure 3 for Patch-wise Auto-Encoder for Visual Anomaly Detection

Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.

* ICIP2023 accepted 
Viaarxiv icon

Semi-supervised Object Detection: A Survey on Recent Research and Progress

Jun 25, 2023
Yanyang Wang, Zhaoxiang Liu, Shiguo Lian

Figure 1 for Semi-supervised Object Detection: A Survey on Recent Research and Progress
Figure 2 for Semi-supervised Object Detection: A Survey on Recent Research and Progress
Figure 3 for Semi-supervised Object Detection: A Survey on Recent Research and Progress
Figure 4 for Semi-supervised Object Detection: A Survey on Recent Research and Progress

In recent years, deep learning technology has been maturely applied in the field of object detection, and most algorithms tend to be supervised learning. However, a large amount of labeled data requires high costs of human resources, which brings about low efficiency and limitations. Semi-supervised object detection (SSOD) has been paid more and more attentions due to its high research value and practicability. It is designed to learn information by using small amounts of labeled data and large amounts of unlabeled data. In this paper, we present a comprehensive and up-to-date survey on the SSOD approaches from five aspects. We first briefly introduce several ways of data augmentation. Then, we dive the mainstream semi-supervised strategies into pseudo labels, consistent regularization, graph based and transfer learning based methods, and introduce some methods in challenging settings. We further present widely-used loss functions, and then we outline the common benchmark datasets and compare the accuracy among different representative approaches. Finally, we conclude this paper and present some promising research directions for the future. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches developed over the past few years.

* 10 pages, 20 figures, 2 tables 
Viaarxiv icon

Application-Driven AI Paradigm for Person Counting in Various Scenarios

Mar 24, 2023
Minjie Hua, Yibing Nan, Shiguo Lian

Figure 1 for Application-Driven AI Paradigm for Person Counting in Various Scenarios
Figure 2 for Application-Driven AI Paradigm for Person Counting in Various Scenarios
Figure 3 for Application-Driven AI Paradigm for Person Counting in Various Scenarios
Figure 4 for Application-Driven AI Paradigm for Person Counting in Various Scenarios

Person counting is considered as a fundamental task in video surveillance. However, the scenario diversity in practical applications makes it difficult to exploit a single person counting model for general use. Consequently, engineers must preview the video stream and manually specify an appropriate person counting model based on the scenario of camera shot, which is time-consuming, especially for large-scale deployments. In this paper, we propose a person counting paradigm that utilizes a scenario classifier to automatically select a suitable person counting model for each captured frame. First, the input image is passed through the scenario classifier to obtain a scenario label, which is then used to allocate the frame to one of five fine-tuned models for person counting. Additionally, we present five augmentation datasets collected from different scenarios, including side-view, long-shot, top-view, customized and crowd, which are also integrated to form a scenario classification dataset containing 26323 samples. In our comparative experiments, the proposed paradigm achieves better balance than any single model on the integrated dataset, thus its generalization in various scenarios has been proved.

Viaarxiv icon

Application-Driven AI Paradigm for Hand-Held Action Detection

Oct 13, 2022
Kohou Wang, Zhaoxiang Liu, Shiguo Lian

Figure 1 for Application-Driven AI Paradigm for Hand-Held Action Detection
Figure 2 for Application-Driven AI Paradigm for Hand-Held Action Detection
Figure 3 for Application-Driven AI Paradigm for Hand-Held Action Detection
Figure 4 for Application-Driven AI Paradigm for Hand-Held Action Detection

In practical applications especially with safety requirement, some hand-held actions need to be monitored closely, including smoking cigarettes, dialing, eating, etc. Taking smoking cigarettes as example, existing smoke detection algorithms usually detect the cigarette or cigarette with hand as the target object only, which leads to low accuracy. In this paper, we propose an application-driven AI paradigm for hand-held action detection based on hierarchical object detection. It is a coarse-to-fine hierarchical detection framework composed of two modules. The first one is a coarse detection module with the human pose consisting of the whole hand, cigarette and head as target object. The followed second one is a fine detection module with the fingers holding cigarette, mouth area and the whole cigarette as target. Some experiments are done with the dataset collected from real-world scenarios, and the results show that the proposed framework achieve higher detection rate with good adaptation and robustness in complex environments.

Viaarxiv icon

Vision-Based Defect Classification and Weight Estimation of Rice Kernels

Oct 06, 2022
Xiang Wang, Kai Wang, Xiaohong Li, Shiguo Lian

Figure 1 for Vision-Based Defect Classification and Weight Estimation of Rice Kernels
Figure 2 for Vision-Based Defect Classification and Weight Estimation of Rice Kernels
Figure 3 for Vision-Based Defect Classification and Weight Estimation of Rice Kernels
Figure 4 for Vision-Based Defect Classification and Weight Estimation of Rice Kernels

Rice is one of the main staple food in many areas of the world. The quality estimation of rice kernels are crucial in terms of both food safety and socio-economic impact. This was usually carried out by quality inspectors in the past, which may result in both objective and subjective inaccuracies. In this paper, we present an automatic visual quality estimation system of rice kernels, to classify the sampled rice kernels according to their types of flaws, and evaluate their quality via the weight ratios of the perspective kernel types. To compensate for the imbalance of different kernel numbers and classify kernels with multiple flaws accurately, we propose a multi-stage workflow which is able to locate the kernels in the captured image and classify their properties. We define a novel metric to measure the relative weight of each kernel in the image from its area, such that the relative weight of each type of kernels with regard to the all samples can be computed and used as the basis for rice quality estimation. Various experiments are carried out to show that our system is able to output precise results in a contactless way and replace tedious and error-prone manual works.

* 10 pages, 10 figures 
Viaarxiv icon

Application-Driven AI Paradigm for Human Action Recognition

Sep 30, 2022
Zezhou Chen, Yajie Cui, Kaikai Zhao, Zhaoxiang Liu, Shiguo Lian

Figure 1 for Application-Driven AI Paradigm for Human Action Recognition
Figure 2 for Application-Driven AI Paradigm for Human Action Recognition
Figure 3 for Application-Driven AI Paradigm for Human Action Recognition
Figure 4 for Application-Driven AI Paradigm for Human Action Recognition

Human action recognition in computer vision has been widely studied in recent years. However, most algorithms consider only certain action specially with even high computational cost. That is not suitable for practical applications with multiple actions to be identified with low computational cost. To meet various application scenarios, this paper presents a unified human action recognition framework composed of two modules, i.e., multi-form human detection and corresponding action classification. Among them, an open-source dataset is constructed to train a multi-form human detection model that distinguishes a human being's whole body, upper body or part body, and the followed action classification model is adopted to recognize such action as falling, sleeping or on-duty, etc. Some experimental results show that the unified framework is effective for various application scenarios. It is expected to be a new application-driven AI paradigm for human action recognition.

Viaarxiv icon

TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance

Sep 26, 2022
Yajun Xu, Chuwen Huang, Yibing Nan, Shiguo Lian

Figure 1 for TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance
Figure 2 for TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance
Figure 3 for TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance
Figure 4 for TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance

Automatic traffic accidents detection has appealed to the machine vision community due to its implications on the development of autonomous intelligent transportation systems (ITS) and importance to traffic safety. Most previous studies on efficient analysis and prediction of traffic accidents, however, have used small-scale datasets with limited coverage, which limits their effect and applicability. Existing datasets in traffic accidents are either small-scale, not from surveillance cameras, not open-sourced, or not built for freeway scenes. Since accidents happened in freeways tend to cause serious damage and are too fast to catch the spot. An open-sourced datasets targeting on freeway traffic accidents collected from surveillance cameras is in great need and of practical importance. In order to help the vision community address these shortcomings, we endeavor to collect video data of real traffic accidents that covered abundant scenes. After integration and annotation by various dimensions, a large-scale traffic accidents dataset named TAD is proposed in this work. Various experiments on image classification, object detection, and video classification tasks, using public mainstream vision algorithms or frameworks are conducted in this work to demonstrate performance of different methods. The proposed dataset together with the experimental results are presented as a new benchmark to improve computer vision research, especially in ITS.

Viaarxiv icon

Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Sep 23, 2022
Huan Hu, Yajie Cui, Zhaoxiang Liu, Shiguo Lian

Figure 1 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design
Figure 2 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design
Figure 3 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design
Figure 4 for Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Deep learning has a wide range of applications in industrial scenario, but reducing false alarm (FA) remains a major difficulty. Optimizing network architecture or network parameters is used to tackle this challenge in academic circles, while ignoring the essential characteristics of data in application scenarios, which often results in increased FA in new scenarios. In this paper, we propose a novel paradigm for fine-grained design of datasets, driven by industrial applications. We flexibly select positive and negative sample sets according to the essential features of the data and application requirements, and add the remaining samples to the training set as uncertainty classes. We collect more than 10,000 mask-wearing recognition samples covering various application scenarios as our experimental data. Compared with the traditional data design methods, our method achieves better results and effectively reduces FA. We make all contributions available to the research community for broader use. The contributions will be available at https://github.com/huh30/OpenDatasets.

Viaarxiv icon

A Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Sep 20, 2022
Huan Hu, Yajie Cui, Zhaoxiang Liu, Shiguo Lian

Figure 1 for A Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design
Figure 2 for A Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design
Figure 3 for A Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design
Figure 4 for A Data-Centric AI Paradigm Based on Application-Driven Fine-grained Dataset Design

Deep learning has a wide range of applications in industrial scenario, but reducing false alarm (FA) remains a major difficulty. Optimizing network architecture or network parameters is used to tackle this challenge in academic circles, while ignoring the essential characteristics of data in application scenarios, which often results in increased FA in new scenarios. In this paper, we propose a novel paradigm for fine-grained design of datasets, driven by industrial applications. We flexibly select positive and negative sample sets according to the essential features of the data and application requirements, and add the remaining samples to the training set as uncertainty classes. We collect more than 10,000 mask-wearing recognition samples covering various application scenarios as our experimental data. Compared with the traditional data design methods, our method achieves better results and effectively reduces FA. We make all contributions available to the research community for broader use. The contributions will be available at https://github.com/huh30/OpenDatasets.

Viaarxiv icon

Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks

Jul 20, 2022
Jianfeng Huang, Chenyang Li, Yimin Lin, Shiguo Lian

Figure 1 for Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks
Figure 2 for Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks
Figure 3 for Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks
Figure 4 for Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks

It is hard to collect enough flaw images for training deep learning network in industrial production. Therefore, existing industrial anomaly detection methods prefer to use CNN-based unsupervised detection and localization network to achieve this task. However, these methods always fail when there are varieties happened in new signals since traditional end-to-end networks suffer barriers of fitting nonlinear model in high-dimensional space. Moreover, they have a memory library by clustering the feature of normal images essentially, which cause it is not robust to texture change. To this end, we propose the Vision Transformer based (VIT-based) unsupervised anomaly detection network. It utilizes a hierarchical task learning and human experience to enhance its interpretability. Our network consists of pattern generation and comparison networks. Pattern generation network uses two VIT-based encoder modules to extract the feature of two consecutive image patches, then uses VIT-based decoder module to learn the human designed style of these features and predict the third image patch. After this, we use the Siamese-based network to compute the similarity of the generation image patch and original image patch. Finally, we refine the anomaly localization by the bi-directional inference strategy. Comparison experiments on public dataset MVTec dataset show our method achieves 99.8% AUC, which surpasses previous state-of-the-art methods. In addition, we give a qualitative illustration on our own leather and cloth datasets. The accurate segment results strongly prove the accuracy of our method in anomaly detection.

Viaarxiv icon