Alert button
Picture for Sijie Zhu

Sijie Zhu

Alert button

Video Anomaly Detection for Smart Surveillance

Apr 02, 2020
Sijie Zhu, Chen Chen, Waqas Sultani

Figure 1 for Video Anomaly Detection for Smart Surveillance
Figure 2 for Video Anomaly Detection for Smart Surveillance
Figure 3 for Video Anomaly Detection for Smart Surveillance
Figure 4 for Video Anomaly Detection for Smart Surveillance

In modern intelligent video surveillance systems, automatic anomaly detection through computer vision analytics plays a pivotal role which not only significantly increases monitoring efficiency but also reduces the burden on live monitoring. Anomalies in videos are broadly defined as events or activities that are unusual and signify irregular behavior. The goal of anomaly detection is to temporally or spatially localize the anomaly events in video sequences. Temporal localization (i.e. indicating the start and end frames of the anomaly event in a video) is referred to as frame-level detection. Spatial localization, which is more challenging, means to identify the pixels within each anomaly frame that correspond to the anomaly event. This setting is usually referred to as pixel-level detection. In this paper, we provide a brief overview of the recent research progress on video anomaly detection and highlight a few future research directions.

Viaarxiv icon

A closer look at network resolution for efficient network design

Sep 27, 2019
Taojiannan Yang, Sijie Zhu, Shen Yan, Mi Zhang, Andrew Willis, Chen Chen

Figure 1 for A closer look at network resolution for efficient network design
Figure 2 for A closer look at network resolution for efficient network design
Figure 3 for A closer look at network resolution for efficient network design
Figure 4 for A closer look at network resolution for efficient network design

There is growing interest in designing lightweight neural networks for mobile and embedded vision applications. Previous works typically reduce computations from the structure level. For example, group convolution based methods reduce computations by factorizing a vanilla convolution into depth-wise and point-wise convolutions. Pruning based methods prune redundant connections in the network structure. In this paper, we explore the importance of network input for achieving optimal accuracy-efficiency trade-off. Reducing input scale is a simple yet effective way to reduce computational cost. It does not require careful network module design, specific hardware optimization and network retraining after pruning. Moreover, different input scales contain different representations to learn. We propose a framework to mutually learn from different input resolutions and network widths. With the shared knowledge, our framework is able to find better width-resolution balance and capture multi-scale representations. It achieves consistently better ImageNet top-1 accuracy over US-Net under different computation constraints, and outperforms the best compound scale model of EfficientNet by 1.5%. The superiority of our framework is also validated on COCO object detection and instance segmentation as well as transfer learning.

Viaarxiv icon

Visual Explanation for Deep Metric Learning

Sep 27, 2019
Sijie Zhu, Taojiannan Yang, Chen Chen

Figure 1 for Visual Explanation for Deep Metric Learning
Figure 2 for Visual Explanation for Deep Metric Learning
Figure 3 for Visual Explanation for Deep Metric Learning
Figure 4 for Visual Explanation for Deep Metric Learning

This work explores the visual explanation for deep metric learning and its applications. As an important problem for learning representation, metric learning has attracted much attention recently, while the interpretation of such model is not as well studied as classification. To this end, we propose an intuitive idea to show where contributes the most to the overall similarity of two input images by decomposing the final activation. Instead of only providing the overall activation map of each image, we propose to generate point-to-point activation intensity between two images so that the relationship between different regions is uncovered. We show that the proposed framework can be directly deployed to a large range of metric learning applications and provides valuable information for understanding the model. Furthermore, our experiments show its effectiveness on two potential applications, i.e. cross-view pattern discovery and interactive retrieval.

Viaarxiv icon