Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Simultaneous regression and feature learning for facial landmarking

Apr 24, 2019
Janez Križaj, Peter Peer, Vitomir Štruc, Simon Dobrišek

Face alignment (or facial landmarking) is an important task in many face-related applications, ranging from registration, tracking and animation to higher-level classification problems such as face, expression or attribute recognition. While several solutions have been presented in the literature for this task so far, reliably locating salient facial features across a wide range of posses still remains challenging. To address this issue, we propose in this paper a novel method for automatic facial landmark localization in 3D face data designed specifically to address appearance variability caused by significant pose variations. Our method builds on recent cascaded-regression-based methods to facial landmarking and uses a gating mechanism to incorporate multiple linear cascaded regression models each trained for a limited range of poses into a single powerful landmarking model capable of processing arbitrary posed input data. We develop two distinct approaches around the proposed gating mechanism: i) the first uses a gated multiple ridge descent (GRID) mechanism in conjunction with established (hand-crafted) HOG features for face alignment and achieves state-of-the-art landmarking performance across a wide range of facial poses, ii) the second simultaneously learns multiple-descent directions as well as binary features (SMUF) that are optimal for the alignment tasks and in addition to competitive landmarking results also ensures extremely rapid processing. We evaluate both approaches in rigorous experiments on several popular datasets of 3D face images, i.e., the FRGCv2 and Bosphorus 3D Face datasets and image collections F and G from the University of Notre Dame. The results of our evaluation show that both approaches are competitive in comparison to the state-of-the-art, while exhibiting considerable robustness to pose variations.

Access Paper or Ask Questions

Training Deep Face Recognition Systems with Synthetic Data

Feb 16, 2018
Adam Kortylewski, Andreas Schneider, Thomas Gerig, Bernhard Egger, Andreas Morel-Forster, Thomas Vetter

Recent advances in deep learning have significantly increased the performance of face recognition systems. The performance and reliability of these models depend heavily on the amount and quality of the training data. However, the collection of annotated large datasets does not scale well and the control over the quality of the data decreases with the size of the dataset. In this work, we explore how synthetically generated data can be used to decrease the number of real-world images needed for training deep face recognition systems. In particular, we make use of a 3D morphable face model for the generation of images with arbitrary amounts of facial identities and with full control over image variations, such as pose, illumination, and background. In our experiments with an off-the-shelf face recognition software we observe the following phenomena: 1) The amount of real training data needed to train competitive deep face recognition systems can be reduced significantly. 2) Combining large-scale real-world data with synthetic data leads to an increased performance. 3) Models trained only on synthetic data with strong variations in pose, illumination, and background perform very well across different datasets even without dataset adaptation. 4) The real-to-virtual performance gap can be closed when using synthetic data for pre-training, followed by fine-tuning with real-world images. 5) There are no observable negative effects of pre-training with synthetic data. Thus, any face recognition system in our experiments benefits from using synthetic face images. The synthetic data generator, as well as all experiments, are publicly available.

Access Paper or Ask Questions

Cross-Cultural and Cultural-Specific Production and Perception of Facial Expressions of Emotion in the Wild

Aug 13, 2018
Ramprakash Srinivasan, Aleix M. Martinez

Automatic recognition of emotion from facial expressions is an intense area of research, with a potentially long list of important application. Yet, the study of emotion requires knowing which facial expressions are used within and across cultures in the wild, not in controlled lab conditions; but such studies do not exist. Which and how many cross-cultural and cultural-specific facial expressions do people commonly use? And, what affect variables does each expression communicate to observers? If we are to design technology that understands the emotion of users, we need answers to these two fundamental questions. In this paper, we present the first large-scale study of the production and visual perception of facial expressions of emotion in the wild. We find that of the 16,384 possible facial configurations that people can theoretically produce, only 35 are successfully used to transmit emotive information across cultures, and only 8 within a smaller number of cultures. Crucially, we find that visual analysis of cross-cultural expressions yields consistent perception of emotion categories and valence, but not arousal. In contrast, visual analysis of cultural-specific expressions yields consistent perception of valence and arousal, but not of emotion categories. Additionally, we find that the number of expressions used to communicate each emotion is also different, e.g., 17 expressions transmit happiness, but only 1 is used to convey disgust.

Access Paper or Ask Questions

Non-contact Pain Recognition from Video Sequences with Remote Physiological Measurements Prediction

May 18, 2021
Ruijing Yang, Ziyu Guan, Zitong Yu, Guoying Zhao, Xiaoyi Feng, Jinye Peng

Automatic pain recognition is paramount for medical diagnosis and treatment. The existing works fall into three categories: assessing facial appearance changes, exploiting physiological cues, or fusing them in a multi-modal manner. However, (1) appearance changes are easily affected by subjective factors which impedes objective pain recognition. Besides, the appearance-based approaches ignore long-range spatial-temporal dependencies that are important for modeling expressions over time; (2) the physiological cues are obtained by attaching sensors on human body, which is inconvenient and uncomfortable. In this paper, we present a novel multi-task learning framework which encodes both appearance changes and physiological cues in a non-contact manner for pain recognition. The framework is able to capture both local and long-range dependencies via the proposed attention mechanism for the learned appearance representations, which are further enriched by temporally attended physiological cues (remote photoplethysmography, rPPG) that are recovered from videos in the auxiliary task. This framework is dubbed rPPG-enriched Spatio-Temporal Attention Network (rSTAN) and allows us to establish the state-of-the-art performance of non-contact pain recognition on publicly available pain databases. It demonstrates that rPPG predictions can be used as an auxiliary task to facilitate non-contact automatic pain recognition.

* IJCAI 2021 
Access Paper or Ask Questions

Facial Landmark Point Localization using Coarse-to-Fine Deep Recurrent Neural Network

May 03, 2018
Shahar Mahpod, Rig Das, Emanuele Maiorana, Yosi Keller, Patrizio Campisi

Facial landmark point localization is a typical problem in computer vision and is extensively used for increasing accuracy of face recognition, facial expression analysis, face animation etc. In recent years, substantial effort have been deployed by many researcher to design a robust facial landmark detection system. However, it still remains as one of the most challenging tasks due to the existence of extreme poses, exaggerated facial expression, unconstrained illumination, etc. In this paper, we propose a novel coarse-to-fine deep recurrent-neural-network (RNN) based framework, which uses heat-map images for facial landmark point localization. The use of heat-map images allows us using the entire face image instead of the face initialization bounding boxes or patch images around the landmark points. Performance of our proposed framework shows significant improvement in case of handling difficult face images with higher degree of occlusion, variation of pose, large yaw angles and illumination. In comparison with the best current state-of-the-art technique a reduction of 45% in failure rate and an improvement of 11.5% in area under the curve for 300-W private test set are some of the main contributions of our proposed framework.

* arXiv admin note: text overlap with arXiv:1706.01789 by other authors 
Access Paper or Ask Questions

An adversarial learning framework for preserving users' anonymity in face-based emotion recognition

Jan 16, 2020
Vansh Narula, Zhangyang, Wang, Theodora Chaspari

Image and video-capturing technologies have permeated our every-day life. Such technologies can continuously monitor individuals' expressions in real-life settings, affording us new insights into their emotional states and transitions, thus paving the way to novel well-being and healthcare applications. Yet, due to the strong privacy concerns, the use of such technologies is met with strong skepticism, since current face-based emotion recognition systems relying on deep learning techniques tend to preserve substantial information related to the identity of the user, apart from the emotion-specific information. This paper proposes an adversarial learning framework which relies on a convolutional neural network (CNN) architecture trained through an iterative procedure for minimizing identity-specific information and maximizing emotion-dependent information. The proposed approach is evaluated through emotion classification and face identification metrics, and is compared against two CNNs, one trained solely for emotion recognition and the other trained solely for face identification. Experiments are performed using the Yale Face Dataset and Japanese Female Facial Expression Database. Results indicate that the proposed approach can learn a convolutional transformation for preserving emotion recognition accuracy and degrading face identity recognition, providing a foundation toward privacy-aware emotion recognition technologies.

Access Paper or Ask Questions

Face Recognition Using Scattering Convolutional Network

Nov 30, 2017
Shervin Minaee, Amirali Abdolrashidi, Yao Wang

Face recognition has been an active research area in the past few decades. In general, face recognition can be very challenging due to variations in viewpoint, illumination, facial expression, etc. Therefore it is essential to extract features which are invariant to some or all of these variations. Here a new image representation, called scattering transform/network, has been used to extract features from faces. The scattering transform is a kind of convolutional network which provides a powerful multi-layer representation for signals. After extraction of scattering features, PCA is applied to reduce the dimensionality of the data and then a multi-class support vector machine is used to perform recognition. The proposed algorithm has been tested on three face datasets and achieved a very high recognition rate.

Access Paper or Ask Questions

A novel database of Children's Spontaneous Facial Expressions (LIRIS-CSE)

Dec 04, 2018
Rizwan Ahmed Khan, Crenn Arthur, Alexandre Meyer, Saida Bouakaz

The computing environment is moving towards human-centered designs instead of computer centered designs and human's tend to communicate wealth of information through affective states or expressions. Traditional Human Computer Interaction based systems ignores bulk of information communicated through those affective states and just caters for user's intentional input. Generally, for evaluating and benchmarking different facial expression analysis algorithms, standardized databases are needed to enable a meaningful comparison. In the absence of comparative tests on such standardized databases it is difficult to find relative strengths and weaknesses of different facial expression recognition algorithms. Recording truly spontaneous instances of basic emotion expressions is extremely difficult, because in everyday life the basic emotions are not shown frequently. However, when they are displayed, they convey a very strong message to someone's surroundings. In this article we presented novel database for Children's Spontaneous facial Expressions (LIRIS-CSE). The database contains six universal spontaneous facial expressions shown by 12 ethnically diverse children between the ages of 6 and 12 years with mean age of 7.3 years. To the best of our knowledge, this database is first of its kind as it records and shows six universal spontaneous facial expressions of children. Previously there were few database of children expressions and all of them show posed or exaggerated expressions which are different from spontaneous or natural expressions. Thus, this database will be a milestone for human behavior researchers. This database will be a excellent resource for vision community for benchmarking and comparing results.

Access Paper or Ask Questions

Feature refinement: An expression-specific feature learning and fusion method for micro-expression recognition

Jan 13, 2021
Ling Zhou, Qirong Mao, Xiaohua Huang, Feifei Zhang, Zhihong Zhang

Micro-Expression Recognition has become challenging, as it is extremely difficult to extract the subtle facial changes of micro-expressions. Recently, several approaches proposed several expression-shared features algorithms for micro-expression recognition. However, they do not reveal the specific discriminative characteristics, which lead to sub-optimal performance. This paper proposes a novel Feature Refinement ({FR}) with expression-specific feature learning and fusion for micro-expression recognition. It aims to obtain salient and discriminative features for specific expressions and also predict expression by fusing the expression-specific features. FR consists of an expression proposal module with attention mechanism and a classification branch. First, an inception module is designed based on optical flow to obtain expression-shared features. Second, in order to extract salient and discriminative features for specific expression, expression-shared features are fed into an expression proposal module with attention factors and proposal loss. Last, in the classification branch, labels of categories are predicted by a fusion of the expression-specific features. Experiments on three publicly available databases validate the effectiveness of FR under different protocol. Results on public benchmarks demonstrate that our FR provides salient and discriminative information for micro-expression recognition. The results also show our FR achieves better or competitive performance with the existing state-of-the-art methods on micro-expression recognition.

Access Paper or Ask Questions

Skeleton Aware Multi-modal Sign Language Recognition

Mar 26, 2021
Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu

Sign language is used by deaf or speech impaired people to communicate and requires great effort to master. Sign Language Recognition (SLR) aims to bridge between sign language users and others by recognizing words from given videos. It is an important yet challenging task since sign language is performed with fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence on subject and background variation. Furthermore, it can be a strong complement to RGB/D modalities to boost the overall recognition rate. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via a Recurrent Neural Network, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to further improve the recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and propose a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. Our skeleton-based method achieves a higher recognition rate compared with all other single modalities. Moreover, our proposed SAM-SLR framework can further enhance the performance by assembling our skeleton-based method with other RGB and depth modalities. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at

* This submission is a preprint version of our work SAM-SLR that ranked 1st at CVPR2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition 
Access Paper or Ask Questions