Abstract:Supervised person re-identification methods rely heavily on high-quality cross-camera training label. This significantly hinders the deployment of re-ID models in real-world applications. The unsupervised person re-ID methods can reduce the cost of data annotation, but their performance is still far lower than the supervised ones. In this paper, we make full use of the auxiliary information mined from the datasets for multi-modal feature learning, including camera information, temporal information and spatial information. By analyzing the style bias of cameras, the characteristics of pedestrians' motion trajectories and the positions of camera network, this paper designs three modules: Time-Overlapping Constraint (TOC), Spatio-Temporal Similarity (STS) and Same-Camera Penalty (SCP) to exploit the auxiliary information. Auxiliary information can improve the model performance and inference accuracy by constructing association constraints or fusing with visual features. In addition, this paper proposes three effective training tricks, including Restricted Label Smoothing Cross Entropy Loss (RLSCE), Weight Adaptive Triplet Loss (WATL) and Dynamic Training Iterations (DTI). The tricks achieve mAP of 72.4% and 81.1% on MARS and DukeMTMC-VideoReID, respectively. Combined with auxiliary information exploiting modules, our methods achieve mAP of 89.9% on DukeMTMC, where TOC, STS and SCP all contributed considerable performance improvements. The method proposed by this paper outperforms most existing unsupervised re-ID methods and narrows the gap between unsupervised and supervised re-ID methods. Our code is at https://github.com/tenghehan/AuxUSLReID.
Abstract:A series of unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets. But their performance is still far lower than the supervised counterparts. In the mean time, clean datasets without noise are used in these methods, which is not realistic. In this paper, we propose to tackle this problem by learning re-ID models from automatically generated person tracklets by multiple objects tracking (MOT) algorithm. To this end, we design a tracklet-based multi-level clustering (TMC) framework to effectively learn the re-ID model from the noisy person tracklets. First, intra-tracklet isolation to reduce ID switch noise within tracklets; second, alternates between using inter-tracklet association to eliminate ID fragmentation noise and network training using the pseudo label. Extensive experiments on MARS with various manually generated noises show the effectiveness of the proposed framework. Specifically, the proposed framework achieved mAP 53.4% and rank-1 63.7% on the simulated tracklets with strongest noise, even outperforming the best existing method on clean tracklets. Based on the results, we believe that building re-ID models from automatically generated noisy tracklets is a reasonable approach and will also be an important way to make re-ID models feasible in real-world applications.