Alert button
Picture for Chenglong Li

Chenglong Li

Alert button

Illumination Distillation Framework for Nighttime Person Re-Identification and A New Benchmark

Aug 31, 2023
Andong Lu, Zhang Zhang, Yan Huang, Yifan Zhang, Chenglong Li, Jin Tang, Liang Wang

Nighttime person Re-ID (person re-identification in the nighttime) is a very important and challenging task for visual surveillance but it has not been thoroughly investigated. Under the low illumination condition, the performance of person Re-ID methods usually sharply deteriorates. To address the low illumination challenge in nighttime person Re-ID, this paper proposes an Illumination Distillation Framework (IDF), which utilizes illumination enhancement and illumination distillation schemes to promote the learning of Re-ID models. Specifically, IDF consists of a master branch, an illumination enhancement branch, and an illumination distillation module. The master branch is used to extract the features from a nighttime image. The illumination enhancement branch first estimates an enhanced image from the nighttime image using a nonlinear curve mapping method and then extracts the enhanced features. However, nighttime and enhanced features usually contain data noise due to unstable lighting conditions and enhancement failures. To fully exploit the complementary benefits of nighttime and enhanced features while suppressing data noise, we propose an illumination distillation module. In particular, the illumination distillation module fuses the features from two branches through a bottleneck fusion model and then uses the fused features to guide the learning of both branches in a distillation manner. In addition, we build a real-world nighttime person Re-ID dataset, named Night600, which contains 600 identities captured from different viewpoints and nighttime illumination conditions under complex outdoor environments. Experimental results demonstrate that our IDF can achieve state-of-the-art performance on two nighttime person Re-ID datasets (i.e., Night600 and Knight ). We will release our code and dataset at https://github.com/Alexadlu/IDF.

* Accepted by TMM 
Viaarxiv icon

Erasure-based Interaction Network for RGBT Video Object Detection and A Unified Benchmark

Aug 03, 2023
Zhengzheng Tu, Qishun Wang, Hongshun Wang, Kunpeng Wang, Chenglong Li

Figure 1 for Erasure-based Interaction Network for RGBT Video Object Detection and A Unified Benchmark
Figure 2 for Erasure-based Interaction Network for RGBT Video Object Detection and A Unified Benchmark
Figure 3 for Erasure-based Interaction Network for RGBT Video Object Detection and A Unified Benchmark
Figure 4 for Erasure-based Interaction Network for RGBT Video Object Detection and A Unified Benchmark

Recently, many breakthroughs are made in the field of Video Object Detection (VOD), but the performance is still limited due to the imaging limitations of RGB sensors in adverse illumination conditions. To alleviate this issue, this work introduces a new computer vision task called RGB-thermal (RGBT) VOD by introducing the thermal modality that is insensitive to adverse illumination conditions. To promote the research and development of RGBT VOD, we design a novel Erasure-based Interaction Network (EINet) and establish a comprehensive benchmark dataset (VT-VOD50) for this task. Traditional VOD methods often leverage temporal information by using many auxiliary frames, and thus have large computational burden. Considering that thermal images exhibit less noise than RGB ones, we develop a negative activation function that is used to erase the noise of RGB features with the help of thermal image features. Furthermore, with the benefits from thermal images, we rely only on a small temporal window to model the spatio-temporal information to greatly improve efficiency while maintaining detection accuracy. VT-VOD50 dataset consists of 50 pairs of challenging RGBT video sequences with complex backgrounds, various objects and different illuminations, which are collected in real traffic scenarios. Extensive experiments on VT-VOD50 dataset demonstrate the effectiveness and efficiency of our proposed method against existing mainstream VOD methods. The code of EINet and the dataset will be released to the public for free academic usage.

Viaarxiv icon

Multi-query Vehicle Re-identification: Viewpoint-conditioned Network, Unified Dataset and New Metric

May 25, 2023
Aihua Zheng, Chaobin Zhang, Weijun Zhang, Chenglong Li, Jin Tang, Chang Tan, Ruoran Jia

Figure 1 for Multi-query Vehicle Re-identification: Viewpoint-conditioned Network, Unified Dataset and New Metric
Figure 2 for Multi-query Vehicle Re-identification: Viewpoint-conditioned Network, Unified Dataset and New Metric
Figure 3 for Multi-query Vehicle Re-identification: Viewpoint-conditioned Network, Unified Dataset and New Metric
Figure 4 for Multi-query Vehicle Re-identification: Viewpoint-conditioned Network, Unified Dataset and New Metric

Existing vehicle re-identification methods mainly rely on the single query, which has limited information for vehicle representation and thus significantly hinders the performance of vehicle Re-ID in complicated surveillance networks. In this paper, we propose a more realistic and easily accessible task, called multi-query vehicle Re-ID, which leverages multiple queries to overcome viewpoint limitation of single one. Based on this task, we make three major contributions. First, we design a novel viewpoint-conditioned network (VCNet), which adaptively combines the complementary information from different vehicle viewpoints, for multi-query vehicle Re-ID. Moreover, to deal with the problem of missing vehicle viewpoints, we propose a cross-view feature recovery module which recovers the features of the missing viewpoints by learnt the correlation between the features of available and missing viewpoints. Second, we create a unified benchmark dataset, taken by 6142 cameras from a real-life transportation surveillance system, with comprehensive viewpoints and large number of crossed scenes of each vehicle for multi-query vehicle Re-ID evaluation. Finally, we design a new evaluation metric, called mean cross-scene precision (mCSP), which measures the ability of cross-scene recognition by suppressing the positive samples with similar viewpoints from same camera. Comprehensive experiments validate the superiority of the proposed method against other methods, as well as the effectiveness of the designed metric in the evaluation of multi-query vehicle Re-ID.

Viaarxiv icon

Dynamic Enhancement Network for Partial Multi-modality Person Re-identification

May 25, 2023
Aihua Zheng, Ziling He, Zi Wang, Chenglong Li, Jin Tang

Figure 1 for Dynamic Enhancement Network for Partial Multi-modality Person Re-identification
Figure 2 for Dynamic Enhancement Network for Partial Multi-modality Person Re-identification
Figure 3 for Dynamic Enhancement Network for Partial Multi-modality Person Re-identification
Figure 4 for Dynamic Enhancement Network for Partial Multi-modality Person Re-identification

Many existing multi-modality studies are based on the assumption of modality integrity. However, the problem of missing arbitrary modalities is very common in real life, and this problem is less studied, but actually important in the task of multi-modality person re-identification (Re-ID). To this end, we design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities, for partial multi-modality person Re-ID. To be specific, the multi-modal representation of the RGB, near-infrared (NIR) and thermal-infrared (TIR) images is learned by three branches, in which the information of missing modalities is recovered by the feature transformation module. Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner, to improve the multi-modality representation. Extensive experiments on multi-modality person Re-ID dataset RGBNT201 and vehicle Re-ID dataset RGBNT100 comparing to the state-of-the-art methods verify the effectiveness of our method in complex and changeable environments.

Viaarxiv icon

Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification

May 23, 2023
Aihua Zheng, Zhiqi Ma, Zi Wang, Chenglong Li

Figure 1 for Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification
Figure 2 for Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification
Figure 3 for Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification
Figure 4 for Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification

Multi-spectral vehicle re-identification aims to address the challenge of identifying vehicles in complex lighting conditions by incorporating complementary visible and infrared information. However, in harsh environments, the discriminative cues in RGB and NIR modalities are often lost due to strong flares from vehicle lamps or sunlight, and existing multi-modal fusion methods are limited in their ability to recover these important cues. To address this problem, we propose a Flare-Aware Cross-modal Enhancement Network that adaptively restores flare-corrupted RGB and NIR features with guidance from the flare-immunized thermal infrared spectrum. First, to reduce the influence of locally degraded appearance due to intense flare, we propose a Mutual Flare Mask Prediction module to jointly obtain flare-corrupted masks in RGB and NIR modalities in a self-supervised manner. Second, to use the flare-immunized TI information to enhance the masked RGB and NIR, we propose a Flare-Aware Cross-modal Enhancement module that adaptively guides feature extraction of masked RGB and NIR spectra with prior flare-immunized knowledge from the TI spectrum. Third, to extract common informative semantic information from RGB and NIR, we propose an Inter-modality Consistency loss that enforces semantic consistency between the two modalities. Finally, to evaluate the proposed FACENet in handling intense flare, we introduce a new multi-spectral vehicle re-ID dataset, called WMVEID863, with additional challenges such as motion blur, significant background changes, and particularly intense flare degradation. Comprehensive experiments on both the newly collected dataset and public benchmark multi-spectral vehicle re-ID datasets demonstrate the superior performance of the proposed FACENet compared to state-of-the-art methods, especially in handling strong flares. The code and dataset will be released soon.

Viaarxiv icon

RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning

Mar 26, 2023
Yabin Zhu, Chenglong Li, Xiao Wang, Jin Tang, Zhixiang Huang

Existing Transformer-based RGBT tracking methods either use cross-attention to fuse the two modalities, or use self-attention and cross-attention to model both modality-specific and modality-sharing information. However, the significant appearance gap between modalities limits the feature representation ability of certain modalities during the fusion process. To address this problem, we propose a novel Progressive Fusion Transformer called ProFormer, which progressively integrates single-modality information into the multimodal representation for robust RGBT tracking. In particular, ProFormer first uses a self-attention module to collaboratively extract the multimodal representation, and then uses two cross-attention modules to interact it with the features of the dual modalities respectively. In this way, the modality-specific information can well be activated in the multimodal representation. Finally, a feed-forward network is used to fuse two interacted multimodal representations for the further enhancement of the final multimodal representation. In addition, existing learning methods of RGBT trackers either fuse multimodal features into one for final classification, or exploit the relationship between unimodal branches and fused branch through a competitive learning strategy. However, they either ignore the learning of single-modality branches or result in one branch failing to be well optimized. To solve these problems, we propose a dynamically guided learning algorithm that adaptively uses well-performing branches to guide the learning of other branches, for enhancing the representation ability of each branch. Extensive experiments demonstrate that our proposed ProFormer sets a new state-of-the-art performance on RGBT210, RGBT234, LasHeR, and VTUAV datasets.

* 13 pages, 9 figures 
Viaarxiv icon

Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification

Oct 11, 2022
Zi wang, Huaibo Huang, Aihua Zheng, Chenglong Li, Ran He

Figure 1 for Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
Figure 2 for Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
Figure 3 for Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification
Figure 4 for Parallel Augmentation and Dual Enhancement for Occluded Person Re-identification

Occluded person re-identification (Re-ID), the task of searching for the same person's images in occluded environments, has attracted lots of attention in the past decades. Recent approaches concentrate on improving performance on occluded data by data/feature augmentation or using extra models to predict occlusions. However, they ignore the imbalance problem in the test set and not fully utilize the information from the training data. To alleviate the above problems, we propose a simple but effective method with Parallel Augmentation and Dual Enhancement (PADE) that is robust on both occluded and non-occluded data, and does not require any auxiliary clues. First, we design a parallel augmentation mechanism (PAM) for occluded Re-ID to generate more suitable occluded data to mitigate the negative effects of unbalanced data. Second, we propose the dual enhancement strategy (DES)for global and local features to promote the context information and details. Experimental results on widely used occluded datasets (OccludedDuke, Partial-REID, and Occluded-ReID) and non-occluded datasets (Market-1501 and DukeMTMC-reID) validate the effectiveness of our method. The code will be available soon.

* Submitted to AAAI 
Viaarxiv icon

Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer

Sep 25, 2022
Chenglong Li, Qiwen Zhu, Tubiao Liu, Jin Tang, Yu Su

Figure 1 for Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer
Figure 2 for Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer
Figure 3 for Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer
Figure 4 for Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer

Hand hygiene is a standard six-step hand-washing action proposed by the World Health Organization (WHO). However, there is no good way to supervise medical staff to do hand hygiene, which brings the potential risk of disease spread. In this work, we propose a new computer vision task called hand hygiene assessment to provide intelligent supervision of hand hygiene for medical staff. Existing action assessment works usually make an overall quality prediction on an entire video. However, the internal structures of hand hygiene action are important in hand hygiene assessment. Therefore, we propose a novel fine-grained learning framework to perform step segmentation and key action scorer in a joint manner for accurate hand hygiene assessment. Existing temporal segmentation methods usually employ multi-stage convolutional network to improve the segmentation robustness, but easily lead to over-segmentation due to the lack of the long-range dependence. To address this issue, we design a multi-stage convolution-transformer network for step segmentation. Based on the observation that each hand-washing step involves several key actions which determine the hand-washing quality, we design a set of key action scorers to evaluate the quality of key actions in each step. In addition, there lacks a unified dataset in hand hygiene assessment. Therefore, under the supervision of medical staff, we contribute a video dataset that contains 300 video sequences with fine-grained annotations. Extensive experiments on the dataset suggest that our method well assesses hand hygiene videos and achieves outstanding performance.

Viaarxiv icon

Ubiquitous Indoor Positioning and Tracking for Industrial Internet-of-Things: A Channel Response Perspective

Sep 25, 2022
Chenglong Li, Emmeric Tanghe, Sofie Pollin, Wout Joseph

Figure 1 for Ubiquitous Indoor Positioning and Tracking for Industrial Internet-of-Things: A Channel Response Perspective
Figure 2 for Ubiquitous Indoor Positioning and Tracking for Industrial Internet-of-Things: A Channel Response Perspective
Figure 3 for Ubiquitous Indoor Positioning and Tracking for Industrial Internet-of-Things: A Channel Response Perspective
Figure 4 for Ubiquitous Indoor Positioning and Tracking for Industrial Internet-of-Things: A Channel Response Perspective

The future of industrial location-aided applications is shaped by the ubiquity of Internet-of-Things (IoT) devices. As an increasing amount of commercial off-the-shelf radio devices support channel response collection, it is possible to achieve fine-grained position estimation at a relatively low cost. In this article, we focus on channel response-based positioning and tracking for industrial IoT applications. We first give an overview of the state of the art (SOTA) of channel response-enabled localization, which is further classified into two categories, \textit{i.e.}, device-based and contact-free schemes. Then we propose a taxonomy for these complementary approaches concerning the involved techniques. Finally, we discuss the practical issues of the SOTA methods for real-world applications and point out future research opportunities for channel response-based positioning and tracking.

Viaarxiv icon

Contact-Free Multi-Target Tracking Using Distributed Massive MIMO-OFDM Communication System: Prototype and Analysis

Aug 26, 2022
Chenglong Li, Sibren De Bast, Yang Miao, Emmeric Tanghe, Sofie Pollin, Wout Joseph

Figure 1 for Contact-Free Multi-Target Tracking Using Distributed Massive MIMO-OFDM Communication System: Prototype and Analysis
Figure 2 for Contact-Free Multi-Target Tracking Using Distributed Massive MIMO-OFDM Communication System: Prototype and Analysis
Figure 3 for Contact-Free Multi-Target Tracking Using Distributed Massive MIMO-OFDM Communication System: Prototype and Analysis
Figure 4 for Contact-Free Multi-Target Tracking Using Distributed Massive MIMO-OFDM Communication System: Prototype and Analysis

Wireless-based human activity recognition has become an essential technology that enables contact-free human-machine and human-environment interactions. In this paper, we consider contact-free multi-target tracking (MTT) based on available communication systems. A radar-like prototype is built upon a sub-6 GHz distributed massive multiple-input and multiple-output (MIMO) orthogonal frequency-division multiplexing communication system. Specifically, the raw channel state information (CSI) is calibrated in the frequency and antenna domain before being used for tracking. Then the targeted CSIs reflected or scattered from the moving pedestrians are extracted. To evade the complex association problem of distributed massive MIMO-based MTT, we propose to use a complex Bayesian compressive sensing (CBCS) algorithm to estimate the targets' locations based on the extracted target-of-interest CSI signal directly. The estimated locations from CBCS are fed to a Gaussian mixture probability hypothesis density filter for tracking. A multi-pedestrian tracking experiment is conducted in a room with size of 6.5 m$\times$10 m to evaluate the performance of the proposed algorithm. According to experimental results, we achieve 75th and 95th percentile accuracy of 12.7 cm and 18.2 cm for single-person tracking and 28.9 cm and 45.7 cm for multi-person tracking, respectively. Furthermore, the proposed algorithm achieves the tracking purposes in real-time, which is promising for practical MTT use cases.

Viaarxiv icon