Alert button
Picture for Zezheng Wang

Zezheng Wang

Alert button

Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

Mar 18, 2022
Zhuo Wang, Zezheng Wang, Zitong Yu, Weihong Deng, Jiahong Li, Tingting Gao, Zhongyuan Wang

Figure 1 for Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing
Figure 2 for Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing
Figure 3 for Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing
Figure 4 for Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing

With diverse presentation attacks emerging continually, generalizable face anti-spoofing (FAS) has drawn growing attention. Most existing methods implement domain generalization (DG) on the complete representations. However, different image statistics may have unique properties for the FAS tasks. In this work, we separate the complete representation into content and style ones. A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space. Then, to obtain a generalized representation, a contrastive learning strategy is developed to emphasize liveness-related style information while suppress the domain-specific one. Finally, the representations of the correct assemblies are used to distinguish between living and spoofing during the inferring. On the other hand, despite the decent performance, there still exists a gap between academia and industry, due to the difference in data quantity and distribution. Thus, a new large-scale benchmark for FAS is built up to further evaluate the performance of algorithms in reality. Both qualitative and quantitative results on existing and proposed benchmarks demonstrate the effectiveness of our methods. The codes will be available at https://github.com/wangzhuo2019/SSAN.

* Accepted by CVPR2022 
Viaarxiv icon

Consistency Regularization for Deep Face Anti-Spoofing

Nov 25, 2021
Zezheng Wang, Zitong Yu, Xun Wang, Yunxiao Qin, Jiahong Li, Chenxu Zhao, Zhen Lei, Xin Liu, Size Li, Zhongyuan Wang

Figure 1 for Consistency Regularization for Deep Face Anti-Spoofing
Figure 2 for Consistency Regularization for Deep Face Anti-Spoofing
Figure 3 for Consistency Regularization for Deep Face Anti-Spoofing
Figure 4 for Consistency Regularization for Deep Face Anti-Spoofing

Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Empirically, given an image, a model with more consistent output on different views of this image usually performs better, as shown in Fig.1. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models. In this paper, we explore this way thoroughly by enhancing both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS. Specifically, at the embedding-level, we design a dense similarity loss to maximize the similarities between all positions of two intermediate feature maps in a self-supervised fashion; while at the prediction-level, we optimize the mean square error between the predictions of two views. Notably, our EPCR is free of annotations and can directly integrate into semi-supervised learning schemes. Considering different application scenarios, we further design five diverse semi-supervised protocols to measure semi-supervised FAS techniques. We conduct extensive experiments to show that EPCR can significantly improve the performance of several supervised and semi-supervised tasks on benchmark datasets. The codes and protocols will be released at https://github.com/clks-wzz/EPCR.

* 10 tables, 4 figures 
Viaarxiv icon

Meta-Teacher For Face Anti-Spoofing

Nov 12, 2021
Yunxiao Qin, Zitong Yu, Longbin Yan, Zezheng Wang, Chenxu Zhao, Zhen Lei

Figure 1 for Meta-Teacher For Face Anti-Spoofing
Figure 2 for Meta-Teacher For Face Anti-Spoofing
Figure 3 for Meta-Teacher For Face Anti-Spoofing
Figure 4 for Meta-Teacher For Face Anti-Spoofing

Face anti-spoofing (FAS) secures face recognition from presentation attacks (PAs). Existing FAS methods usually supervise PA detectors with handcrafted binary or pixel-wise labels. However, handcrafted labels may are not the most adequate way to supervise PA detectors learning sufficient and intrinsic spoofing cues. Instead of using the handcrafted labels, we propose a novel Meta-Teacher FAS (MT-FAS) method to train a meta-teacher for supervising PA detectors more effectively. The meta-teacher is trained in a bi-level optimization manner to learn the ability to supervise the PA detectors learning rich spoofing cues. The bi-level optimization contains two key components: 1) a lower-level training in which the meta-teacher supervises the detector's learning process on the training set; and 2) a higher-level training in which the meta-teacher's teaching performance is optimized by minimizing the detector's validation loss. Our meta-teacher differs significantly from existing teacher-student models because the meta-teacher is explicitly trained for better teaching the detector (student), whereas existing teachers are trained for outstanding accuracy neglecting teaching ability. Extensive experiments on five FAS benchmarks show that with the proposed MT-FAS, the trained meta-teacher 1) provides better-suited supervision than both handcrafted labels and existing teacher-student models; and 2) significantly improves the performances of PA detectors.

* Accepted by IEEE TPAMI-2021 
Viaarxiv icon

PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition

Jul 25, 2021
Qiang Meng, Xiaqing Xu, Xiaobo Wang, Yang Qian, Yunxiao Qin, Zezheng Wang, Chenxu Zhao, Feng Zhou, Zhen Lei

Figure 1 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
Figure 2 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
Figure 3 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
Figure 4 for PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition

Despite the great success achieved by deep learning methods in face recognition, severe performance drops are observed for large pose variations in unconstrained environments (e.g., in cases of surveillance and photo-tagging). To address it, current methods either deploy pose-specific models or frontalize faces by additional modules. Still, they ignore the fact that identity information should be consistent across poses and are not realizing the data imbalance between frontal and profile face images during training. In this paper, we propose an efficient PoseFace framework which utilizes the facial landmarks to disentangle the pose-invariant features and exploits a pose-adaptive loss to handle the imbalance issue adaptively. Extensive experimental results on the benchmarks of Multi-PIE, CFP, CPLFW and IJB have demonstrated the superiority of our method over the state-of-the-arts.

Viaarxiv icon

Layer-Wise Adaptive Updating for Few-Shot Image Classification

Jul 16, 2020
Yunxiao Qin, Weiguo Zhang, Zezheng Wang, Chenxu Zhao, Jingping Shi

Figure 1 for Layer-Wise Adaptive Updating for Few-Shot Image Classification
Figure 2 for Layer-Wise Adaptive Updating for Few-Shot Image Classification
Figure 3 for Layer-Wise Adaptive Updating for Few-Shot Image Classification
Figure 4 for Layer-Wise Adaptive Updating for Few-Shot Image Classification

Few-shot image classification (FSIC), which requires a model to recognize new categories via learning from few images of these categories, has attracted lots of attention. Recently, meta-learning based methods have been shown as a promising direction for FSIC. Commonly, they train a meta-learner (meta-learning model) to learn easy fine-tuning weight, and when solving an FSIC task, the meta-learner efficiently fine-tunes itself to a task-specific model by updating itself on few images of the task. In this paper, we propose a novel meta-learning based layer-wise adaptive updating (LWAU) method for FSIC. LWAU is inspired by an interesting finding that compared with common deep models, the meta-learner pays much more attention to update its top layer when learning from few images. According to this finding, we assume that the meta-learner may greatly prefer updating its top layer to updating its bottom layers for better FSIC performance. Therefore, in LWAU, the meta-learner is trained to learn not only the easy fine-tuning model but also its favorite layer-wise adaptive updating rule to improve its learning efficiency. Extensive experiments show that with the layer-wise adaptive updating rule, the proposed LWAU: 1) outperforms existing few-shot classification methods with a clear margin; 2) learns from few images more efficiently by at least 5 times than existing meta-learners when solving FSIC.

Viaarxiv icon

Multi-Modal Face Anti-Spoofing Based on Central Difference Networks

Apr 17, 2020
Zitong Yu, Yunxiao Qin, Xiaobai Li, Zezheng Wang, Chenxu Zhao, Zhen Lei, Guoying Zhao

Figure 1 for Multi-Modal Face Anti-Spoofing Based on Central Difference Networks
Figure 2 for Multi-Modal Face Anti-Spoofing Based on Central Difference Networks
Figure 3 for Multi-Modal Face Anti-Spoofing Based on Central Difference Networks
Figure 4 for Multi-Modal Face Anti-Spoofing Based on Central Difference Networks

Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks. Existing multi-modal FAS methods rely on stacked vanilla convolutions, which is weak in describing detailed intrinsic information from modalities and easily being ineffective when the domain shifts (e.g., cross attack and cross ethnicity). In this paper, we extend the central difference convolutional networks (CDCN) \cite{yu2020searching} to a multi-modal version, intending to capture intrinsic spoofing patterns among three modalities (RGB, depth and infrared). Meanwhile, we also give an elaborate study about single-modal based CDCN. Our approach won the first place in "Track Multi-Modal" as well as the second place in "Track Single-Modal (RGB)" of ChaLearn Face Anti-spoofing Attack Detection Challenge@CVPR2020 \cite{liu2020cross}. Our final submission obtains 1.02$\pm$0.59\% and 4.84$\pm$1.79\% ACER in "Track Multi-Modal" and "Track Single-Modal (RGB)", respectively. The codes are available at{https://github.com/ZitongYu/CDCN}.

* 1st place in "Track Multi-Modal" of ChaLearn Face Anti-spoofing Attack Detection Challenge@CVPR2020; Accepted by CVPR2020 Media Forensics Workshop. arXiv admin note: text overlap with arXiv:2003.04092 
Viaarxiv icon

Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing

Mar 18, 2020
Zezheng Wang, Zitong Yu, Chenxu Zhao, Xiangyu Zhu, Yunxiao Qin, Qiusheng Zhou, Feng Zhou, Zhen Lei

Figure 1 for Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing
Figure 2 for Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing
Figure 3 for Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing
Figure 4 for Deep Spatial Gradient and Temporal Depth Learning for Face Anti-spoofing

Face anti-spoofing is critical to the security of face recognition systems. Depth supervised learning has been proven as one of the most effective methods for face anti-spoofing. Despite the great success, most previous works still formulate the problem as a single-frame multi-task one by simply augmenting the loss with depth, while neglecting the detailed fine-grained information and the interplay between facial depths and moving patterns. In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. The proposed method is able to capture discriminative details via Residual Spatial Gradient Block (RSGB) and encode spatio-temporal information from Spatio-Temporal Propagation Module (STPM) efficiently. Moreover, a novel Contrastive Depth Loss is presented for more accurate depth supervision. To assess the efficacy of our method, we also collect a Double-modal Anti-spoofing Dataset (DMAD) which provides actual depth for each sample. The experiments demonstrate that the proposed approach achieves state-of-the-art results on five benchmark datasets including OULU-NPU, SiW, CASIA-MFSD, Replay-Attack, and the new DMAD. Codes will be available at https://github.com/clks-wzz/FAS-SGTD.

* Accepted by CVPR2020 (oral) 
Viaarxiv icon