Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"facial recognition": models, code, and papers

Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning

Feb 22, 2023
Haiyu Wu, Grace Bezold, Aman Bhatta, Kevin W. Bowyer

Figure 1 for Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning

Figure 2 for Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning

Figure 3 for Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning

Figure 4 for Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning

Face attribute research has so far used only simple binary attributes for facial hair; e.g., beard / no beard. We have created a new, more descriptive facial hair annotation scheme and applied it to create a new facial hair attribute dataset, FH37K. Face attribute research also so far has not dealt with logical consistency and completeness. For example, in prior research, an image might be classified as both having no beard and also having a goatee (a type of beard). We show that the test accuracy of previous classification methods on facial hair attribute classification drops significantly if logical consistency of classifications is enforced. We propose a logically consistent prediction loss, LCPLoss, to aid learning of logical consistency across attributes, and also a label compensation training strategy to eliminate the problem of no positive prediction across a set of related attributes. Using an attribute classifier trained on FH37K, we investigate how facial hair affects face recognition accuracy, including variation across demographics. Results show that similarity and difference in facial hairstyle have important effects on the impostor and genuine score distributions in face recognition.

Via

Access Paper or Ask Questions

Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Mar 24, 2022
Hangyu Li, Nannan Wang, Xi Yang, Xiaoyu Wang, Xinbo Gao

Figure 1 for Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Figure 2 for Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Figure 3 for Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Figure 4 for Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin

Only parts of unlabeled data are selected to train models for most semi-supervised learning methods, whose confidence scores are usually higher than the pre-defined threshold (i.e., the confidence margin). We argue that the recognition performance should be further improved by making full use of all unlabeled data. In this paper, we learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition. All unlabeled samples are partitioned into two subsets by comparing their confidence scores with the adaptively learned confidence margin at each training epoch: (1) subset I including samples whose confidence scores are no lower than the margin; (2) subset II including samples whose confidence scores are lower than the margin. For samples in subset I, we constrain their predictions to match pseudo labels. Meanwhile, samples in subset II participate in the feature-level contrastive objective to learn effective facial expression features. We extensively evaluate Ada-CM on four challenging datasets, showing that our method achieves state-of-the-art performance, especially surpassing fully-supervised baselines in a semi-supervised manner. Ablation study further proves the effectiveness of our method. The source code is available at https://github.com/hangyu94/Ada-CM.

* Accepted by CVPR2022

Via

Access Paper or Ask Questions

STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition

Jan 08, 2023
Ming Li, Jun Liu, Hehe Fan, Jia-Wei Liu, Jiahe Li, Mike Zheng Shou, Jussi Keppo

Figure 1 for STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition

Figure 2 for STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition

Figure 3 for STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition

Figure 4 for STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition

Recently privacy-preserving action recognition (PPAR) has been becoming an appealing video understanding problem. Nevertheless, existing works focus on the frame-level (spatial) privacy preservation, ignoring the privacy leakage from a whole video and destroying the temporal continuity of actions. In this paper, we present a novel PPAR paradigm, i.e., performing privacy preservation from both spatial and temporal perspectives, and propose a STPrivacy framework. For the first time, our STPrivacy applies vision Transformers to PPAR and regards a video as a sequence of spatio-temporal tubelets, showing outstanding advantages over previous convolutional methods. Specifically, our STPrivacy adaptively treats privacy-containing tubelets in two different manners. The tubelets irrelevant to actions are directly abandoned, i.e., sparsification, and not published for subsequent tasks. In contrast, those highly involved in actions are anonymized, i.e., anonymization, to remove private information. These two transformation mechanisms are complementary and simultaneously optimized in our unified framework. Because there is no large-scale benchmarks, we annotate five privacy attributes for two of the most popular action recognition datasets, i.e., HMDB51 and UCF101, and conduct extensive experiments on them. Moreover, to verify the generalization ability of our STPrivacy, we further introduce a privacy-preserving facial expression recognition task and conduct experiments on a large-scale video facial attributes dataset, i.e., Celeb-VHQ. The thorough comparisons and visualization analysis demonstrate our significant superiority over existing works. The appendix contains more details and visualizations.

Via

Access Paper or Ask Questions

FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Mar 20, 2022
Yan Wang, Yixuan Sun, Yiwen Huang, Zhongying Liu, Shuyong Gao, Wei Zhang, Weifeng Ge, Wenqiang Zhang

Figure 1 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Figure 2 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Figure 3 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Figure 4 for FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Current benchmarks for facial expression recognition (FER) mainly focus on static images, while there are limited datasets for FER in videos. It is still ambiguous to evaluate whether performances of existing methods remain satisfactory in real-world application-oriented scenes. For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. We analyze the important ingredients of constructing such a novel dataset in three aspects: (1) multi-scene hierarchy and expression class, (2) generation of candidate video clips, (3) trusted manual labelling process. Based on these guidelines, we select 4 scenarios subdivided into 22 scenes, annotate 86k samples automatically obtained from 4k videos based on the well-designed workflow, and finally build 38,935 video clips labeled with 7 classic expressions. Experiment benchmarks on four kinds of baseline frameworks were also provided and further analysis on their performance across different scenes and some challenges for future research were given. Besides, we systematically investigate key components of DFER by ablation studies. The baseline framework and our project will be available.

* Accepted for CVPR2022

Via

Access Paper or Ask Questions

2D+3D facial expression recognition via embedded tensor manifold regularization

Jan 29, 2022
Yunfang Fu, Qiuqi Ruan, Ziyan Luo, Gaoyun An, Yi Jin, Jun Wan

Figure 1 for 2D+3D facial expression recognition via embedded tensor manifold regularization

Figure 2 for 2D+3D facial expression recognition via embedded tensor manifold regularization

Figure 3 for 2D+3D facial expression recognition via embedded tensor manifold regularization

Figure 4 for 2D+3D facial expression recognition via embedded tensor manifold regularization

In this paper, a novel approach via embedded tensor manifold regularization for 2D+3D facial expression recognition (FERETMR) is proposed. Firstly, 3D tensors are constructed from 2D face images and 3D face shape models to keep the structural information and correlations. To maintain the local structure (geometric information) of 3D tensor samples in the low-dimensional tensors space during the dimensionality reduction, the $\ell_0$-norm of the core tensors and a tensor manifold regularization scheme embedded on core tensors are adopted via a low-rank truncated Tucker decomposition on the generated tensors. As a result, the obtained factor matrices will be used for facial expression classification prediction. To make the resulting tensor optimization more tractable, $\ell_1$-norm surrogate is employed to relax $\ell_0$-norm and hence the resulting tensor optimization problem has a nonsmooth objective function due to the $\ell_1$-norm and orthogonal constraints from the orthogonal Tucker decomposition. To efficiently tackle this tensor optimization problem, we establish the first-order optimality condition in terms of stationary points, and then design a block coordinate descent (BCD) algorithm with convergence analysis and the computational complexity. Numerical results on BU-3DFE database and Bosphorus databases demonstrate the effectiveness of our proposed approach.

Via

Access Paper or Ask Questions

A Systematic Evaluation of Domain Adaptation in Facial Expression Recognition

Jun 29, 2021
Yan San Kong, Varsha Suresh, Jonathan Soh, Desmond C. Ong

Figure 1 for A Systematic Evaluation of Domain Adaptation in Facial Expression Recognition

Figure 2 for A Systematic Evaluation of Domain Adaptation in Facial Expression Recognition

Figure 3 for A Systematic Evaluation of Domain Adaptation in Facial Expression Recognition

Figure 4 for A Systematic Evaluation of Domain Adaptation in Facial Expression Recognition

Facial Expression Recognition is a commercially important application, but one common limitation is that applications often require making predictions on out-of-sample distributions, where target images may have very different properties from the images that the model was trained on. How well, or badly, do these models do on unseen target domains? In this paper, we provide a systematic evaluation of domain adaptation in facial expression recognition. Using state-of-the-art transfer learning techniques and six commonly-used facial expression datasets (three collected in the lab and three "in-the-wild"), we conduct extensive round-robin experiments to examine the classification accuracies for a state-of-the-art CNN model. We also perform multi-source experiments where we examine a model's ability to transfer from multiple source datasets, including (i) within-setting (e.g., lab to lab), (ii) cross-setting (e.g., in-the-wild to lab), (iii) mixed-setting (e.g., lab and wild to lab) transfer learning experiments. We find sobering results that the accuracy of transfer learning is not high, and varies idiosyncratically with the target dataset, and to a lesser extent the source dataset. Generally, the best settings for transfer include fine-tuning the weights of a pre-trained model, and we find that training with more datasets, regardless of setting, improves transfer performance. We end with a discussion of the need for more -- and regular -- systematic investigations into the generalizability of FER models, especially for deployed applications.

Via

Access Paper or Ask Questions

Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

Sep 12, 2022
Yufei Huang, Mohsen Jafari, Peter Jin

Figure 1 for Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

Figure 2 for Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

Figure 3 for Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

Figure 4 for Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

Risk assessment of roadways is commonly practiced based on historical crash data. Information on driver behaviors and real-time traffic situations is sometimes missing. In this paper, the Safe Route Mapping (SRM) model, a methodology for developing dynamic risk heat maps of roadways, is extended to consider driver behaviors when making predictions. An Android App is designed to gather drivers' information and upload it to a server. On the server, facial recognition extracts drivers' data, such as facial landmarks, gaze directions, and emotions. The driver's drowsiness and distraction are detected, and driving performance is evaluated. Meanwhile, dynamic traffic information is captured by a roadside camera and uploaded to the same server. A longitudinal-scanline-based arterial traffic video analytics is applied to recognize vehicles from the video to build speed and trajectory profiles. Based on these data, a LightGBM model is introduced to predict conflict indices for drivers in the next one or two seconds. Then, multiple data sources, including historical crash counts and predicted traffic conflict indicators, are combined using a Fuzzy logic model to calculate risk scores for road segments. The proposed SRM model is illustrated using data collected from an actual traffic intersection and a driving simulation platform. The prediction results show that the model is accurate, and the added driver behavior features will improve the model's performance. Finally, risk heat maps are generated for visualization purposes. The authorities can use the dynamic heat map to designate safe corridors and dispatch law enforcement and drivers for early warning and trip planning.

* 12 pages, 19 figures, 9 tables

Via

Access Paper or Ask Questions

Understanding Cross Domain Presentation Attack Detection for Visible Face Recognition

Nov 03, 2021
Jennifer Hamblin, Kshitij Nikhal, Benjamin S. Riggan

Figure 1 for Understanding Cross Domain Presentation Attack Detection for Visible Face Recognition

Figure 2 for Understanding Cross Domain Presentation Attack Detection for Visible Face Recognition

Figure 3 for Understanding Cross Domain Presentation Attack Detection for Visible Face Recognition

Figure 4 for Understanding Cross Domain Presentation Attack Detection for Visible Face Recognition

Face signatures, including size, shape, texture, skin tone, eye color, appearance, and scars/marks, are widely used as discriminative, biometric information for access control. Despite recent advancements in facial recognition systems, presentation attacks on facial recognition systems have become increasingly sophisticated. The ability to detect presentation attacks or spoofing attempts is a pressing concern for the integrity, security, and trust of facial recognition systems. Multi-spectral imaging has been previously introduced as a way to improve presentation attack detection by utilizing sensors that are sensitive to different regions of the electromagnetic spectrum (e.g., visible, near infrared, long-wave infrared). Although multi-spectral presentation attack detection systems may be discriminative, the need for additional sensors and computational resources substantially increases complexity and costs. Instead, we propose a method that exploits information from infrared imagery during training to increase the discriminability of visible-based presentation attack detection systems. We introduce (1) a new cross-domain presentation attack detection framework that increases the separability of bonafide and presentation attacks using only visible spectrum imagery, (2) an inverse domain regularization technique for added training stability when optimizing our cross-domain presentation attack detection framework, and (3) a dense domain adaptation subnetwork to transform representations between visible and non-visible domains.

Via

Access Paper or Ask Questions

Does lossy image compression affect racial bias within face recognition?

Aug 16, 2022
Seyma Yucer, Matt Poyser, Noura Al Moubayed, Toby P. Breckon

Figure 1 for Does lossy image compression affect racial bias within face recognition?

Figure 2 for Does lossy image compression affect racial bias within face recognition?

Figure 3 for Does lossy image compression affect racial bias within face recognition?

Figure 4 for Does lossy image compression affect racial bias within face recognition?

Yes - This study investigates the impact of commonplace lossy image compression on face recognition algorithms with regard to the racial characteristics of the subject. We adopt a recently proposed racial phenotype-based bias analysis methodology to measure the effect of varying levels of lossy compression across racial phenotype categories. Additionally, we determine the relationship between chroma-subsampling and race-related phenotypes for recognition performance. Prior work investigates the impact of lossy JPEG compression algorithm on contemporary face recognition performance. However, there is a gap in how this impact varies with different race-related inter-sectional groups and the cause of this impact. Via an extensive experimental setup, we demonstrate that common lossy image compression approaches have a more pronounced negative impact on facial recognition performance for specific racial phenotype categories such as darker skin tones (by up to 34.55\%). Furthermore, removing chroma-subsampling during compression improves the false matching rate (up to 15.95\%) across all phenotype categories affected by the compression, including darker skin tones, wide noses, big lips, and monolid eye categories. In addition, we outline the characteristics that may be attributable as the underlying cause of such phenomenon for lossy compression algorithms such as JPEG.

Via

Access Paper or Ask Questions

Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Jan 03, 2022
Soumendu Chakraborty, Satish Kumar Singh, Pavan Chakraborty

Figure 1 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 2 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 3 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

Figure 4 for Local Quadruple Pattern: A Novel Descriptor for Facial Image Recognition and Retrieval

In this paper a novel hand crafted local quadruple pattern (LQPAT) is proposed for facial image recognition and retrieval. Most of the existing hand-crafted descriptors encodes only a limited number of pixels in the local neighbourhood. Under unconstrained environment the performance of these descriptors tends to degrade drastically. The major problem in increasing the local neighbourhood is that, it also increases the feature length of the descriptor. The proposed descriptor try to overcome these problems by defining an efficient encoding structure with optimal feature length. The proposed descriptor encodes relations amongst the neighbours in quadruple space. Two micro patterns are computed from the local relationships to form the descriptor. The retrieval and recognition accuracies of the proposed descriptor has been compared with state of the art hand crafted descriptors on bench mark databases namely; Caltech-face, LFW, Colour-FERET, and CASIA-face-v5. Result analysis shows that the proposed descriptor performs well under uncontrolled variations in pose, illumination, background and expressions.

* Computers & Electrical Engineering, vol-62, pp. 92-104, (2017). (Elsevier) ISSN/ISBN: 0045-7906
* arXiv admin note: substantial text overlap with arXiv:2201.00504, arXiv:2201.00511

Via

Access Paper or Ask Questions