Abstract:Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match individuals across visible and infrared cameras without relying on any annotation. Given the significant gap across visible and infrared modality, estimating reliable cross-modality association becomes a major challenge in USVI-ReID. Existing methods usually adopt optimal transport to associate the intra-modality clusters, which is prone to propagating the local cluster errors, and also overlooks global instance-level relations. By mining and attending to the visible-infrared modality bias, this paper focuses on addressing cross-modality learning from two aspects: bias-mitigated global association and modality-invariant representation learning. Motivated by the camera-aware distance rectification in single-modality re-ID, we propose modality-aware Jaccard distance to mitigate the distance bias caused by modality discrepancy, so that more reliable cross-modality associations can be estimated through global clustering. To further improve cross-modality representation learning, a `split-and-contrast' strategy is designed to obtain modality-specific global prototypes. By explicitly aligning these prototypes under global association guidance, modality-invariant yet ID-discriminative representation learning can be achieved. While conceptually simple, our method obtains state-of-the-art performance on benchmark VI-ReID datasets and outperforms existing methods by a significant margin, validating its effectiveness.




Abstract:Video anomaly detection is a complex task, and the principle of "divide and conquer" is often regarded as an effective approach to tackling intricate issues. It's noteworthy that recent methods in video anomaly detection have revealed the application of the divide and conquer philosophy (albeit with distinct perspectives from traditional usage), yielding impressive outcomes. This paper systematically reviews these literatures from six dimensions, aiming to enhance the use of the divide and conquer strategy in video anomaly detection. Furthermore, based on the insights gained from this review, a novel approach is presented, which integrates human skeletal frameworks with video data analysis techniques. This method achieves state-of-the-art performance on the ShanghaiTech dataset, surpassing all existing advanced methods.




Abstract:Previous approaches to detecting human anomalies in videos have typically relied on implicit modeling by directly applying the model to video or skeleton data, potentially resulting in inaccurate modeling of motion information. In this paper, we conduct an exploratory study and introduce a new idea called HKVAD (Human Kinematic-inspired Video Anomaly Detection) for video anomaly detection, which involves the explicit use of human kinematic features to detect anomalies. To validate the effectiveness and potential of this perspective, we propose a pilot method that leverages the kinematic features of the skeleton pose, with a specific focus on the walking stride, skeleton displacement at feet level, and neck level. Following this, the method employs a normalizing flow model to estimate density and detect anomalies based on the estimated density. Based on the number of kinematic features used, we have devised three straightforward variant methods and conducted experiments on two highly challenging public datasets, ShanghaiTech and UBnormal. Our method achieves good results with minimal computational resources, validating its effectiveness and potential.