Alert button
Picture for Chenxu Zhao

Chenxu Zhao

Alert button

FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

May 05, 2023
Ajian Liu, Zichang Tan, Zitong Yu, Chenxu Zhao, Jun Wan, Yanyan Liang, Zhen Lei, Du Zhang, Stan Z. Li, Guodong Guo

Figure 1 for FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing
Figure 2 for FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing
Figure 3 for FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing
Figure 4 for FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing

The availability of handy multi-modal (i.e., RGB-D) sensors has brought about a surge of face anti-spoofing research. However, the current multi-modal face presentation attack detection (PAD) has two defects: (1) The framework based on multi-modal fusion requires providing modalities consistent with the training input, which seriously limits the deployment scenario. (2) The performance of ConvNet-based model on high fidelity datasets is increasingly limited. In this work, we present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT), for face anti-spoofing to flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data. Specifically, FM-ViT retains a specific branch for each modality to capture different modal information and introduces the Cross-Modal Transformer Block (CMTB), which consists of two cascaded attentions named Multi-headed Mutual-Attention (MMA) and Fusion-Attention (MFA) to guide each modal branch to mine potential features from informative patch tokens, and to learn modality-agnostic liveness features by enriching the modal information of own CLS token, respectively. Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin, and approaches the multi-modal frameworks introduced with smaller FLOPs and model parameters.

* 12 pages, 7 figures 
Viaarxiv icon

Surveillance Face Anti-spoofing

Jan 03, 2023
Hao Fang, Ajian Liu, Jun Wan, Sergio Escalera, Chenxu Zhao, Xu Zhang, Stan Z. Li, Zhen Lei

Figure 1 for Surveillance Face Anti-spoofing
Figure 2 for Surveillance Face Anti-spoofing
Figure 3 for Surveillance Face Anti-spoofing
Figure 4 for Surveillance Face Anti-spoofing

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

* 15 pages, 9 figures 
Viaarxiv icon

Face Presentation Attack Detection

Dec 07, 2022
Zitong Yu, Chenxu Zhao, Zhen Lei

Figure 1 for Face Presentation Attack Detection
Figure 2 for Face Presentation Attack Detection
Figure 3 for Face Presentation Attack Detection
Figure 4 for Face Presentation Attack Detection

Face recognition technology has been widely used in daily interactive applications such as checking-in and mobile payment due to its convenience and high accuracy. However, its vulnerability to presentation attacks (PAs) limits its reliable use in ultra-secure applicational scenarios. A presentation attack is first defined in ISO standard as: a presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system. Specifically, PAs range from simple 2D print, replay and more sophisticated 3D masks and partial masks. To defend the face recognition systems against PAs, both academia and industry have paid extensive attention to developing face presentation attack detection (PAD) technology (or namely `face anti-spoofing (FAS)').

* Handbook of Face Recognition (3rd Ed.) 
Viaarxiv icon

Flexible-Modal Face Anti-Spoofing: A Benchmark

Feb 16, 2022
Zitong Yu, Chenxu Zhao, Kevin H. M. Cheng, Xu Cheng, Guoying Zhao

Figure 1 for Flexible-Modal Face Anti-Spoofing: A Benchmark
Figure 2 for Flexible-Modal Face Anti-Spoofing: A Benchmark
Figure 3 for Flexible-Modal Face Anti-Spoofing: A Benchmark
Figure 4 for Flexible-Modal Face Anti-Spoofing: A Benchmark

Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks. Benefitted from the maturing camera sensors, single-modal (RGB) and multi-modal (e.g., RGB+Depth) FAS has been applied in various scenarios with different configurations of sensors/modalities. Existing single- and multi-modal FAS methods usually separately train and deploy models for each possible modality scenario, which might be redundant and inefficient. Can we train a unified model, and flexibly deploy it under various modality scenarios? In this paper, we establish the first flexible-modal FAS benchmark with the principle `train one for all'. To be specific, with trained multi-modal (RGB+Depth+IR) FAS models, both intra- and cross-dataset testings are conducted on four flexible-modal sub-protocols (RGB, RGB+Depth, RGB+IR, and RGB+Depth+IR). We also investigate prevalent deep models and feature fusion strategies for flexible-modal FAS. We hope this new benchmark will facilitate the future research of the multi-modal FAS. The protocols and codes are available at https://github.com/ZitongYu/Flex-Modal-FAS.

Viaarxiv icon

Makeup216: Logo Recognition with Adversarial Attention Representations

Dec 13, 2021
Junjun Hu, Yanhao Zhu, Bo Zhao, Jiexin Zheng, Chenxu Zhao, Xiangyu Zhu, Kangle Wu, Darun Tang

Figure 1 for Makeup216: Logo Recognition with Adversarial Attention Representations
Figure 2 for Makeup216: Logo Recognition with Adversarial Attention Representations
Figure 3 for Makeup216: Logo Recognition with Adversarial Attention Representations
Figure 4 for Makeup216: Logo Recognition with Adversarial Attention Representations

One of the challenges of logo recognition lies in the diversity of forms, such as symbols, texts or a combination of both; further, logos tend to be extremely concise in design while similar in appearance, suggesting the difficulty of learning discriminative representations. To investigate the variety and representation of logo, we introduced Makeup216, the largest and most complex logo dataset in the field of makeup, captured from the real world. It comprises of 216 logos and 157 brands, including 10,019 images and 37,018 annotated logo objects. In addition, we found that the marginal background around the pure logo can provide a important context information and proposed an adversarial attention representation framework (AAR) to attend on the logo subject and auxiliary marginal background separately, which can be combined for better representation. Our proposed framework achieved competitive results on Makeup216 and another large-scale open logo dataset, which could provide fresh thinking for logo recognition. The dataset of Makeup216 and the code of the proposed framework will be released soon.

Viaarxiv icon

Consistency Regularization for Deep Face Anti-Spoofing

Nov 25, 2021
Zezheng Wang, Zitong Yu, Xun Wang, Yunxiao Qin, Jiahong Li, Chenxu Zhao, Zhen Lei, Xin Liu, Size Li, Zhongyuan Wang

Figure 1 for Consistency Regularization for Deep Face Anti-Spoofing
Figure 2 for Consistency Regularization for Deep Face Anti-Spoofing
Figure 3 for Consistency Regularization for Deep Face Anti-Spoofing
Figure 4 for Consistency Regularization for Deep Face Anti-Spoofing

Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Empirically, given an image, a model with more consistent output on different views of this image usually performs better, as shown in Fig.1. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models. In this paper, we explore this way thoroughly by enhancing both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS. Specifically, at the embedding-level, we design a dense similarity loss to maximize the similarities between all positions of two intermediate feature maps in a self-supervised fashion; while at the prediction-level, we optimize the mean square error between the predictions of two views. Notably, our EPCR is free of annotations and can directly integrate into semi-supervised learning schemes. Considering different application scenarios, we further design five diverse semi-supervised protocols to measure semi-supervised FAS techniques. We conduct extensive experiments to show that EPCR can significantly improve the performance of several supervised and semi-supervised tasks on benchmark datasets. The codes and protocols will be released at https://github.com/clks-wzz/EPCR.

* 10 tables, 4 figures 
Viaarxiv icon

Meta-Teacher For Face Anti-Spoofing

Nov 12, 2021
Yunxiao Qin, Zitong Yu, Longbin Yan, Zezheng Wang, Chenxu Zhao, Zhen Lei

Figure 1 for Meta-Teacher For Face Anti-Spoofing
Figure 2 for Meta-Teacher For Face Anti-Spoofing
Figure 3 for Meta-Teacher For Face Anti-Spoofing
Figure 4 for Meta-Teacher For Face Anti-Spoofing

Face anti-spoofing (FAS) secures face recognition from presentation attacks (PAs). Existing FAS methods usually supervise PA detectors with handcrafted binary or pixel-wise labels. However, handcrafted labels may are not the most adequate way to supervise PA detectors learning sufficient and intrinsic spoofing cues. Instead of using the handcrafted labels, we propose a novel Meta-Teacher FAS (MT-FAS) method to train a meta-teacher for supervising PA detectors more effectively. The meta-teacher is trained in a bi-level optimization manner to learn the ability to supervise the PA detectors learning rich spoofing cues. The bi-level optimization contains two key components: 1) a lower-level training in which the meta-teacher supervises the detector's learning process on the training set; and 2) a higher-level training in which the meta-teacher's teaching performance is optimized by minimizing the detector's validation loss. Our meta-teacher differs significantly from existing teacher-student models because the meta-teacher is explicitly trained for better teaching the detector (student), whereas existing teachers are trained for outstanding accuracy neglecting teaching ability. Extensive experiments on five FAS benchmarks show that with the proposed MT-FAS, the trained meta-teacher 1) provides better-suited supervision than both handcrafted labels and existing teacher-student models; and 2) significantly improves the performances of PA detectors.

* Accepted by IEEE TPAMI-2021 
Viaarxiv icon

3D High-Fidelity Mask Face Presentation Attack Detection Challenge

Aug 16, 2021
Ajian Liu, Chenxu Zhao, Zitong Yu, Anyang Su, Xing Liu, Zijian Kong, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Guodong Guo

Figure 1 for 3D High-Fidelity Mask Face Presentation Attack Detection Challenge
Figure 2 for 3D High-Fidelity Mask Face Presentation Attack Detection Challenge
Figure 3 for 3D High-Fidelity Mask Face Presentation Attack Detection Challenge
Figure 4 for 3D High-Fidelity Mask Face Presentation Attack Detection Challenge

The threat of 3D masks to face recognition systems is increasingly serious and has been widely concerned by researchers. To facilitate the study of the algorithms, a large-scale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask) has been collected. Specifically, it consists of a total amount of 54, 600 videos which are recorded from 75 subjects with 225 realistic masks under 7 new kinds of sensors. Based on this dataset and Protocol 3 which evaluates both the discrimination and generalization ability of the algorithm under the open set scenarios, we organized a 3D High-Fidelity Mask Face Presentation Attack Detection Challenge to boost the research of 3D mask-based attack detection. It attracted 195 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-run by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including the introduction of the dataset used, the definition of the protocol, the calculation of the evaluation criteria, and the summary and publication of the competition results. Finally, we focus on introducing and analyzing the top ranking algorithms, the conclusion summary, and the research ideas for mask attack detection provided by this competition.

Viaarxiv icon

PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition

Jul 25, 2021
Qiang Meng, Xiaqing Xu, Xiaobo Wang, Yang Qian, Yunxiao Qin, Zezheng Wang, Chenxu Zhao, Feng Zhou, Zhen Lei

Figure 1 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
Figure 2 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
Figure 3 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
Figure 4 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition

Despite the great success achieved by deep learning methods in face recognition, severe performance drops are observed for large pose variations in unconstrained environments (e.g., in cases of surveillance and photo-tagging). To address it, current methods either deploy pose-specific models or frontalize faces by additional modules. Still, they ignore the fact that identity information should be consistent across poses and are not realizing the data imbalance between frontal and profile face images during training. In this paper, we propose an efficient PoseFace framework which utilizes the facial landmarks to disentangle the pose-invariant features and exploits a pose-adaptive loss to handle the imbalance issue adaptively. Extensive experimental results on the benchmarks of Multi-PIE, CFP, CPLFW and IJB have demonstrated the superiority of our method over the state-of-the-arts.

Viaarxiv icon