Alert button
Picture for Peng Wu

Peng Wu

Alert button

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Aug 25, 2023
Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

Figure 1 for VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Figure 2 for VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Figure 3 for VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Figure 4 for VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

The recent contrastive language-image pre-training (CLIP) model has shown great success in a wide range of image-level tasks, revealing remarkable ability for learning powerful visual representations with rich semantics. An open and worthwhile problem is efficiently adapting such a strong model to the video domain and designing a robust video anomaly detector. In this work, we propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) by leveraging the frozen CLIP model directly without any pre-training and fine-tuning process. Unlike current works that directly feed extracted features into the weakly supervised classifier for frame-level binary classification, VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP and involves dual branch. One branch simply utilizes visual features for coarse-grained binary classification, while the other fully leverages the fine-grained language-image alignment. With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained video anomaly detection by transferring pre-trained knowledge from CLIP to WSVAD task. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD, surpassing the state-of-the-art methods by a large margin. Specifically, VadCLIP achieves 84.51% AP and 88.02% AUC on XD-Violence and UCF-Crime, respectively. Code and features will be released to facilitate future VAD research.

* Submitted 
Viaarxiv icon

Robust Interference Mitigation techniques for Direct Position Estimation

Aug 09, 2023
Haoqing Li, Shuo Tang, Peng Wu, Pau Closas

Figure 1 for Robust Interference Mitigation techniques for Direct Position Estimation
Figure 2 for Robust Interference Mitigation techniques for Direct Position Estimation
Figure 3 for Robust Interference Mitigation techniques for Direct Position Estimation
Figure 4 for Robust Interference Mitigation techniques for Direct Position Estimation

Global Navigation Satellite System (GNSS) is pervasive in navigation and positioning applications, where precise position and time referencing estimations are required. Conventional methods for GNSS positioning involve a two-step process, where intermediate measurements such as Doppler shift and time delay of received GNSS signals are computed and then used to solve for the receiver's position. Alternatively, Direct Position Estimation (DPE) was proposed to infer the position directly from the sampled signal without intermediate variables, yielding to superior levels of sensitivity and operation under challenging environments. However, the positioning resilience of DPE method is still under the threat of various interferences. Robust Interference Mitigation (RIM) processing has been studied and proved to be efficient against various interference in conventional two-step positioning (2SP) methods, and therefore worthy to be explored regarding its potential to enhance DPE. This article extends DPE methodology by incorporating RIM strategies that address the increasing need to protect GNSS receivers against intentional or unintentional interferences, such as jamming signals, which can deny GNSS-based positioning. RIM, which leverages robust statistics, was shown to provide competitive results in two-step approaches and is here employed in a high-sensitivity DPE framework with successful results. The article also provides a quantification of the loss of efficiency of using RIM when no interference is present and validates the proposed methodology on relevant interference cases, while the approach can be used to mitigate other common interference signals.

Viaarxiv icon

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

Jul 24, 2023
Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang

Figure 1 for Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Figure 2 for Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Figure 3 for Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Figure 4 for Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

Video anomaly detection (VAD) has been paid increasing attention due to its potential applications, its current dominant tasks focus on online detecting anomalies% at the frame level, which can be roughly interpreted as the binary or multiple event classification. However, such a setup that builds relationships between complicated anomalous events and single labels, e.g., ``vandalism'', is superficial, since single labels are deficient to characterize anomalous events. In reality, users tend to search a specific video rather than a series of approximate videos. Therefore, retrieving anomalous events using detailed descriptions is practical and positive but few researches focus on this. In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e.g., language descriptions and synchronous audios. Unlike the current video retrieval where videos are assumed to be temporally well-trimmed with short duration, VAR is devised to retrieve long untrimmed videos which may be partially relevant to the given query. To achieve this, we present two large-scale VAR benchmarks, UCFCrime-AR and XDViolence-AR, constructed on top of prevalent anomaly datasets. Meanwhile, we design a model called Anomaly-Led Alignment Network (ALAN) for VAR. In ALAN, we propose an anomaly-led sampling to focus on key segments in long untrimmed videos. Then, we introduce an efficient pretext task to enhance semantic associations between video-text fine-grained representations. Besides, we leverage two complementary alignments to further match cross-modal contents. Experimental results on two benchmarks reveal the challenges of VAR task and also demonstrate the advantages of our tailored method.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Viaarxiv icon

Jammer classification with Federated Learning

Jun 05, 2023
Peng Wu, Helena Calatrava, Tales Imbiriba, Pau Closas

Figure 1 for Jammer classification with Federated Learning
Figure 2 for Jammer classification with Federated Learning
Figure 3 for Jammer classification with Federated Learning
Figure 4 for Jammer classification with Federated Learning

Jamming signals can jeopardize the operation of GNSS receivers until denying its operation. Given their ubiquity, jamming mitigation and localization techniques are of crucial importance, for which jammer classification is of help. Data-driven models have been proven useful in detecting these threats, while their training using crowdsourced data still poses challenges when it comes to private data sharing. This article investigates the use of federated learning to train jamming signal classifiers locally on each device, with model updates aggregated and averaged at the central server. This allows for privacy-preserving training procedures that do not require centralized data storage or access to client local data. The used framework FedAvg is assessed on a dataset consisting of spectrogram images of simulated interfered GNSS signal. Six different jammer types are effectively classified with comparable results to a fully centralized solution that requires vast amounts of data communication and involves privacy-preserving concerns.

Viaarxiv icon

RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

May 18, 2023
Xuran Li, Peng Wu, Kaixiang Dong, Zhen Zhang

Figure 1 for RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search
Figure 2 for RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search
Figure 3 for RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search
Figure 4 for RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

The trustworthiness of DNNs is often challenged by their vulnerability to minor adversarial perturbations, which may not only undermine prediction accuracy (robustness) but also cause biased predictions for similar inputs (individual fairness). Accurate fairness has been recently proposed to enforce a harmonic balance between accuracy and individual fairness. It induces the notion of fairness confusion matrix to categorize predictions as true fair, true biased, false fair, and false biased. This paper proposes a harmonic evaluation approach, RobustFair, for the accurate fairness of DNNs, using adversarial perturbations crafted through fairness confusion directed gradient search. By using Taylor expansions to approximate the ground truths of adversarial instances, RobustFair can particularly identify the robustness defects entangled for spurious fairness, which are often elusive in robustness evaluation, and missing in individual fairness evaluation. RobustFair can boost robustness and individual fairness evaluations by identifying robustness or fairness defects simultaneously. Empirical case studies on fairness benchmark datasets show that, compared with the state-of-the-art white-box robustness and individual fairness testing approaches, RobustFair detects significantly 1.77-11.87 times adversarial perturbations, yielding 1.83-13.12 times biased and 1.53-8.22 times false instances. The adversarial instances can then be effectively exploited to improve the accurate fairness (and hence accuracy and individual fairness) of the original deep neural network through retraining. The empirical case studies further show that the adversarial instances identified by RobustFair outperform those identified by the other testing approaches, in promoting 21% accurate fairness and 19% individual fairness on multiple sensitive attributes, without losing accuracy at all or even promoting it by up to 4%.

Viaarxiv icon

Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations

Apr 17, 2023
Haoxuan Li, Yanghao Xiao, Chunyuan Zheng, Peng Wu

Figure 1 for Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations
Figure 2 for Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations
Figure 3 for Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations
Figure 4 for Balancing Unobserved Confounding with a Few Unbiased Ratings in Debiased Recommendations

Recommender systems are seen as an effective tool to address information overload, but it is widely known that the presence of various biases makes direct training on large-scale observational data result in sub-optimal prediction performance. In contrast, unbiased ratings obtained from randomized controlled trials or A/B tests are considered to be the golden standard, but are costly and small in scale in reality. To exploit both types of data, recent works proposed to use unbiased ratings to correct the parameters of the propensity or imputation models trained on the biased dataset. However, the existing methods fail to obtain accurate predictions in the presence of unobserved confounding or model misspecification. In this paper, we propose a theoretically guaranteed model-agnostic balancing approach that can be applied to any existing debiasing method with the aim of combating unobserved confounding and model misspecification. The proposed approach makes full use of unbiased data by alternatively correcting model parameters learned with biased data, and adaptively learning balance coefficients of biased samples for further debiasing. Extensive real-world experiments are conducted along with the deployment of our proposal on four representative debiasing methods to demonstrate the effectiveness.

* Proceedings of the ACM Web Conference 2023 (WWW '23), April 30-May 4, 2023, Austin, TX, USA  
* Accepted Paper in WWW'23 
Viaarxiv icon

Video Event Restoration Based on Keyframes for Video Anomaly Detection

Apr 11, 2023
Zhiwei Yang, Jing Liu, Zhaoyang Wu, Peng Wu, Xiaotao Liu

Figure 1 for Video Event Restoration Based on Keyframes for Video Anomaly Detection

Video anomaly detection (VAD) is a significant computer vision problem. Existing deep neural network (DNN) based VAD methods mostly follow the route of frame reconstruction or frame prediction. However, the lack of mining and learning of higher-level visual features and temporal context relationships in videos limits the further performance of these two approaches. Inspired by video codec theory, we introduce a brand-new VAD paradigm to break through these limitations: First, we propose a new task of video event restoration based on keyframes. Encouraging DNN to infer missing multiple frames based on video keyframes so as to restore a video event, which can more effectively motivate DNN to mine and learn potential higher-level visual features and comprehensive temporal context relationships in the video. To this end, we propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration, where a cross-attention and a temporal upsampling residual skip connection are introduced to further assist in restoring complex static and dynamic motion object features in the video. In addition, we propose a simple and effective adjacent frame difference loss to constrain the motion consistency of the video sequence. Extensive experiments on benchmarks demonstrate that USTN-DSC outperforms most existing methods, validating the effectiveness of our method.

* Accepted by CVPR 2023 
Viaarxiv icon

Bayesian data fusion with shared priors

Dec 14, 2022
Peng Wu, Tales Imbiriba, Victor Elvira, Pau Closas

Figure 1 for Bayesian data fusion with shared priors
Figure 2 for Bayesian data fusion with shared priors
Figure 3 for Bayesian data fusion with shared priors
Figure 4 for Bayesian data fusion with shared priors

The integration of data and knowledge from several sources is known as data fusion. When data is available in a distributed fashion or when different sensors are used to infer a quantity of interest, data fusion becomes essential. In Bayesian settings, a priori information of the unknown quantities is available and, possibly, shared among the distributed estimators. When the local estimates are fused, such prior might be overused unless it is accounted for. This paper explores the effects of shared priors in Bayesian data fusion contexts, providing fusion rules and analysis to understand the performance of such fusion as a function of the number of collaborative agents and the uncertainty of the priors. Analytical results are corroborated through experiments in a variety of estimation and classification problems.

* 31 pages 
Viaarxiv icon

ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection

Nov 17, 2022
Yiyang Shen, Rongwei Yu, Peng Wu, Haoran Xie, Lina Gong, Jing Qin, Mingqiang Wei

Figure 1 for ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection
Figure 2 for ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection
Figure 3 for ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection
Figure 4 for ImLiDAR: Cross-Sensor Dynamic Message Propagation Network for 3D Object Detection

LiDAR and camera, as two different sensors, supply geometric (point clouds) and semantic (RGB images) information of 3D scenes. However, it is still challenging for existing methods to fuse data from the two cross sensors, making them complementary for quality 3D object detection (3OD). We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds. ImLiDAR enables to provide the detection head with cross-sensor yet robustly fused features. To achieve this, two core designs exist in ImLiDAR. First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features. Second, we raise a direct set prediction problem that allows designing an effective set-based detector to tackle the inconsistency of the classification and localization confidences, and the sensitivity of hand-tuned hyperparameters. Besides, the novel set-based detector can be detachable and easily integrated into various detection networks. Comparisons on both the KITTI and SUN-RGBD datasets show clear visual and numerical improvements of our ImLiDAR over twenty-three state-of-the-art 3OD methods.

* 12 pages 
Viaarxiv icon