Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-related Applications

Jun 10, 2016
Ciprian Corneanu, Marc Oliu, Jeffrey F. Cohn, Sergio Escalera

Facial expressions are an important way through which humans interact socially. Building a system capable of automatically recognizing facial expressions from images and video has been an intense field of study in recent years. Interpreting such expressions remains challenging and much research is needed about the way they relate to human affect. This paper presents a general overview of automatic RGB, 3D, thermal and multimodal facial expression analysis. We define a new taxonomy for the field, encompassing all steps from face detection to facial expression recognition, and describe and classify the state of the art methods accordingly. We also present the important datasets and the bench-marking of most influential methods. We conclude with a general discussion about trends, important questions and future lines of research.


FD-GAN: Face-demorphing generative adversarial network for restoring accomplice's facial image

Nov 19, 2018
Fei Peng, Le-bing Zhang, Min Long

Face morphing attack is proved to be a serious threat to the existing face recognition systems. Although a few face morphing detection methods have been put forward, the face morphing accomplice's facial restoration remains a challenging problem. In this paper, a face-demorphing generative adversarial network (FD-GAN) is proposed to restore the accomplice's facial image. It utilizes a symmetric dual network architecture and two levels of restoration losses to separate the identity feature of the morphing accomplice. By exploiting the captured face image (containing the criminal's identity) from the face recognition system and the morphed image stored in the e-passport system (containing both criminal and accomplice's identities), the FD-GAN can effectively restore the accomplice's facial image. Experimental results and analysis demonstrate the effectiveness of the proposed scheme. It has great potential to be implemented for detecting the face morphing accomplice in a real identity verification scenario.

* 7 pages, 5 figures 

Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

Mar 25, 2022
Andrey V. Savchenko

In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.

* 6 pages, 2 figures, 5 tables 

Leveraging Recent Advances in Deep Learning for Audio-Visual Emotion Recognition

Mar 16, 2021
Liam Schoneveld, Alice Othmani, Hazem Abdelkawy

Emotional expressions are the behaviors that communicate our emotional state or attitude to others. They are expressed through verbal and non-verbal communication. Complex human behavior can be understood by studying physical features from multiple modalities; mainly facial, vocal and physical gestures. Recently, spontaneous multi-modal emotion recognition has been extensively studied for human behavior analysis. In this paper, we propose a new deep learning-based approach for audio-visual emotion recognition. Our approach leverages recent advances in deep learning like knowledge distillation and high-performing deep architectures. The deep feature representations of the audio and visual modalities are fused based on a model-level fusion strategy. A recurrent neural network is then used to capture the temporal dynamics. Our proposed approach substantially outperforms state-of-the-art approaches in predicting valence on the RECOLA dataset. Moreover, our proposed visual facial expression feature extraction network outperforms state-of-the-art results on the AffectNet and Google Facial Expression Comparison datasets.

* 8 pages, 3 figures, Pattern Recognition Letters 

Robust multi-camera view face recognition

Mar 30, 2010
Dakshina Ranjan Kisku, Hunny Mehrotra, Phalguni Gupta, Jamuna Kanta Sing

This paper presents multi-appearance fusion of Principal Component Analysis (PCA) and generalization of Linear Discriminant Analysis (LDA) for multi-camera view offline face recognition (verification) system. The generalization of LDA has been extended to establish correlations between the face classes in the transformed representation and this is called canonical covariate. The proposed system uses Gabor filter banks for characterization of facial features by spatial frequency, spatial locality and orientation to make compensate to the variations of face instances occurred due to illumination, pose and facial expression changes. Convolution of Gabor filter bank to face images produces Gabor face representations with high dimensional feature vectors. PCA and canonical covariate are then applied on the Gabor face representations to reduce the high dimensional feature spaces into low dimensional Gabor eigenfaces and Gabor canonical faces. Reduced eigenface vector and canonical face vector are fused together using weighted mean fusion rule. Finally, support vector machines (SVM) have trained with augmented fused set of features and perform the recognition task. The system has been evaluated with UMIST face database consisting of multiview faces. The experimental results demonstrate the efficiency and robustness of the proposed system for multi-view face images with high recognition rates. Complexity analysis of the proposed system is also presented at the end of the experimental results.

* 10 pages, 3 figures, IJCA 

Linear Disentangled Representation Learning for Facial Actions

Jan 11, 2017
Xiang Xiang, Trac D. Tran

Limited annotated data available for the recognition of facial expression and action units embarrasses the training of deep networks, which can learn disentangled invariant features. However, a linear model with just several parameters normally is not demanding in terms of training data. In this paper, we propose an elegant linear model to untangle confounding factors in challenging realistic multichannel signals such as 2D face videos. The simple yet powerful model does not rely on huge training data and is natural for recognizing facial actions without explicitly disentangling the identity. Base on well-understood intuitive linear models such as Sparse Representation based Classification (SRC), previous attempts require a prepossessing of explicit decoupling which is practically inexact. Instead, we exploit the low-rank property across frames to subtract the underlying neutral faces which are modeled jointly with sparse representation on the action components with group sparsity enforced. On the extended Cohn-Kanade dataset (CK+), our one-shot automatic method on raw face videos performs as competitive as SRC applied on manually prepared action components and performs even better than SRC in terms of true positive rate. We apply the model to the even more challenging task of facial action unit recognition, verified on the MPI Face Video Database (MPI-VDB) achieving a decent performance. All the programs and data have been made publicly available.

* Codes available at and arXiv admin note: text overlap with arXiv:1410.1606 

Peak-Piloted Deep Network for Facial Expression Recognition

Jan 03, 2017
Xiangyun Zhao, Xiaodan Liang, Luoqi Liu, Teng Li, Yugang Han, Nuno Vasconcelos, Shuicheng Yan

Objective functions for training of deep networks for face-related recognition tasks, such as facial expression recognition (FER), usually consider each sample independently. In this work, we present a novel peak-piloted deep network (PPDN) that uses a sample with peak expression (easy sample) to supervise the intermediate feature responses for a sample of non-peak expression (hard sample) of the same type and from the same subject. The expression evolving process from non-peak expression to peak expression can thus be implicitly embedded in the network to achieve the invariance to expression intensities. A special purpose back-propagation procedure, peak gradient suppression (PGS), is proposed for network training. It drives the intermediate-layer feature responses of non-peak expression samples towards those of the corresponding peak expression samples, while avoiding the inverse. This avoids degrading the recognition capability for samples of peak expression due to interference from their non-peak expression counterparts. Extensive comparisons on two popular FER datasets, Oulu-CASIA and CK+, demonstrate the superiority of the PPDN over state-ofthe-art FER methods, as well as the advantages of both the network structure and the optimization strategy. Moreover, it is shown that PPDN is a general architecture, extensible to other tasks by proper definition of peak and non-peak samples. This is validated by experiments that show state-of-the-art performance on pose-invariant face recognition, using the Multi-PIE dataset.

* Published in ECCV 2016 

The Intelligent ICU Pilot Study: Using Artificial Intelligence Technology for Autonomous Patient Monitoring

Sep 26, 2018
Anis Davoudi, Kumar Rohit Malhotra, Benjamin Shickel, Scott Siegel, Seth Williams, Matthew Ruppert, Emel Bihorac, Tezcan Ozrazgat-Baslanti, Patrick J. Tighe, Azra Bihorac, Parisa Rashidi

Currently, many critical care indices are repetitively assessed and recorded by overburdened nurses, e.g. physical function or facial pain expressions of nonverbal patients. In addition, many essential information on patients and their environment are not captured at all, or are captured in a non-granular manner, e.g. sleep disturbance factors such as bright light, loud background noise, or excessive visitations. In this pilot study, we examined the feasibility of using pervasive sensing technology and artificial intelligence for autonomous and granular monitoring of critically ill patients and their environment in the Intensive Care Unit (ICU). As an exemplar prevalent condition, we also characterized delirious and non-delirious patients and their environment. We used wearable sensors, light and sound sensors, and a high-resolution camera to collected data on patients and their environment. We analyzed collected data using deep learning and statistical analysis. Our system performed face detection, face recognition, facial action unit detection, head pose detection, facial expression recognition, posture recognition, actigraphy analysis, sound pressure and light level detection, and visitation frequency detection. We were able to detect patient's face (Mean average precision (mAP)=0.94), recognize patient's face (mAP=0.80), and their postures (F1=0.94). We also found that all facial expressions, 11 activity features, visitation frequency during the day, visitation frequency during the night, light levels, and sound pressure levels during the night were significantly different between delirious and non-delirious patients (p-value<0.05). In summary, we showed that granular and autonomous monitoring of critically ill patients and their environment is feasible and can be used for characterizing critical care conditions and related environment factors.


A Comprehensive Study on Face Recognition Biases Beyond Demographics

Mar 02, 2021
Philipp Terhörst, Jan Niklas Kolf, Marco Huber, Florian Kirchbuchner, Naser Damer, Aythami Morales, Julian Fierrez, Arjan Kuijper

Face recognition (FR) systems have a growing effect on critical decision-making processes. Recent works have shown that FR solutions show strong performance differences based on the user's demographics. However, to enable a trustworthy FR technology, it is essential to know the influence of an extended range of facial attributes on FR beyond demographics. Therefore, in this work, we analyse FR bias over a wide range of attributes. We investigate the influence of 47 attributes on the verification performance of two popular FR models. The experiments were performed on the publicly available MAADFace attribute database with over 120M high-quality attribute annotations. To prevent misleading statements about biased performances, we introduced control group based validity values to decide if unbalanced test data causes the performance differences. The results demonstrate that also many non-demographic attributes strongly affect the recognition performance, such as accessories, hair-styles and colors, face shapes, or facial anomalies. The observations of this work show the strong need for further advances in making FR system more robust, explainable, and fair. Moreover, our findings might help to a better understanding of how FR networks work, to enhance the robustness of these networks, and to develop more generalized bias-mitigating face recognition solutions.

* Under review in IEEE Transactions on Technology and Society 

Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition

Feb 19, 2018
Gerard Pons, David Masip

Automated emotion recognition in the wild from facial images remains a challenging problem. Although recent advances in Deep Learning have supposed a significant breakthrough in this topic, strong changes in pose, orientation and point of view severely harm current approaches. In addition, the acquisition of labeled datasets is costly, and current state-of-the-art deep learning algorithms cannot model all the aforementioned difficulties. In this paper, we propose to apply a multi-task learning loss function to share a common feature representation with other related tasks. Particularly we show that emotion recognition benefits from jointly learning a model with a detector of facial Action Units (collective muscle movements). The proposed loss function addresses the problem of learning multiple tasks with heterogeneously labeled data, improving previous multi-task approaches. We validate the proposal using two datasets acquired in non controlled environments, and an application to predict compound facial emotion expressions.

* Preprint submitted to IJCV