Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Deep Convolutional Neural Networks for Smile Recognition

Aug 26, 2015
Patrick O. Glauner

This thesis describes the design and implementation of a smile detector based on deep convolutional neural networks. It starts with a summary of neural networks, the difficulties of training them and new training methods, such as Restricted Boltzmann Machines or autoencoders. It then provides a literature review of convolutional neural networks and recurrent neural networks. In order to select databases for smile recognition, comprehensive statistics of databases popular in the field of facial expression recognition were generated and are summarized in this thesis. It then proposes a model for smile detection, of which the main part is implemented. The experimental results are discussed in this thesis and justified based on a comprehensive model selection performed. All experiments were run on a Tesla K40c GPU benefiting from a speedup of up to factor 10 over the computations on a CPU. A smile detection test accuracy of 99.45% is achieved for the Denver Intensity of Spontaneous Facial Action (DISFA) database, significantly outperforming existing approaches with accuracies ranging from 65.55% to 79.67%. This experiment is re-run under various variations, such as retaining less neutral images or only the low or high intensities, of which the results are extensively compared.

* MSc thesis 

Shared Representation Learning for Heterogeneous Face Recognition

Jun 05, 2014
Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li

After intensive research, heterogenous face recognition is still a challenging problem. The main difficulties are owing to the complex relationship between heterogenous face image spaces. The heterogeneity is always tightly coupled with other variations, which makes the relationship of heterogenous face images highly nonlinear. Many excellent methods have been proposed to model the nonlinear relationship, but they apt to overfit to the training set, due to limited samples. Inspired by the unsupervised algorithms in deep learning, this paper proposes an novel framework for heterogeneous face recognition. We first extract Gabor features at some localized facial points, and then use Restricted Boltzmann Machines (RBMs) to learn a shared representation locally to remove the heterogeneity around each facial point. Finally, the shared representations of local RBMs are connected together and processed by PCA. Two problems (Sketch-Photo and NIR-VIS) and three databases are selected to evaluate the proposed method. For Sketch-Photo problem, we obtain perfect results on the CUFS database. For NIR-VIS problem, we produce new state-of-the-art performance on the CASIA HFB and NIR-VIS 2.0 databases.


Predicting Performance of a Face Recognition System Based on Image Quality

Oct 24, 2015
Abhishek Dutta

In this dissertation, we present a generative model to capture the relation between facial image quality features (like pose, illumination direction, etc) and face recognition performance. Such a model can be used to predict the performance of a face recognition system. Since the model is based solely on image quality features, performance predictions can be done even before the actual recognition has taken place thereby facilitating many preemptive action. A practical limitation of such a data driven generative model is the limited nature of training data set. To address this limitation, we have developed a Bayesian approach to model the distribution of recognition performance measure based on the number of match and non-match scores in small regions of the image quality space. Random samples drawn from these models provide the initial data essential for training the generative model. Experiment results based on six face recognition systems operating on three independent data sets show that the proposed performance prediction model can accurately predict face recognition performance using an accurate and unbiased Image Quality Assessor (IQA). Furthermore, our results show that variability in the unaccounted quality space -- the image quality features not considered by the IQA -- is the major factor causing inaccuracies in predicted performance.

* PhD thesis publicly defended at the University of Twente (Netherlands) on April 24, 2015 at 12.45 

Gaze-enhanced Crossmodal Embeddings for Emotion Recognition

Apr 30, 2022
Ahmed Abdou, Ekta Sood, Philipp Müller, Andreas Bulling

Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e.g. speech during a phone call. While previous work proposed crossmodal emotion embeddings to improve monomodal recognition performance, despite its importance, an explicit representation of gaze was not included. We propose a new approach to emotion recognition that incorporates an explicit representation of gaze in a crossmodal emotion embedding framework. We show that our method outperforms the previous state of the art for both audio-only and video-only emotion classification on the popular One-Minute Gradual Emotion Recognition dataset. Furthermore, we report extensive ablation experiments and provide detailed insights into the performance of different state-of-the-art gaze representations and integration strategies. Our results not only underline the importance of gaze for emotion recognition but also demonstrate a practical and highly effective approach to leveraging gaze information for this task.


Cross-Database Micro-Expression Recognition: A Benchmark

Dec 19, 2018
Yuan Zong, Wenming Zheng, Xiaopeng Hong, Chuangao Tang, Zhen Cui, Guoying Zhao

Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problem in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in the inconsistency of the feature distributions between the training and testing sets. In this paper, we contribute to this topic from three aspects. First, we establish a CDMER experimental evaluation protocol aiming to allow the researchers to conveniently work on this topic and provide a standard platform for evaluating their proposed methods. Second, we conduct benchmark experiments by using NINE state-of-the-art domain adaptation (DA) methods and SIX popular spatiotemporal descriptors for respectively investigating CDMER problem from two different perspectives. Third, we propose a novel DA method called region selective transfer regression (RSTR) to deal with the CDMER task. Our RSTR takes advantage of one important cue for recognizing micro-expressions, i.e., the different contributions of the facial local regions in MER. The overall superior performance of RSTR demonstrates that taking into consideration the important cues benefiting MER, e.g., the facial local region information, contributes to develop effective DA methods for dealing with CDMER problem.

* 13 pages 

Few-Data Guided Learning Upon End-to-End Point Cloud Network for 3D Face Recognition

Mar 31, 2021
Yi Yu, Feipeng Da, Ziyu Zhang

3D face recognition has shown its potential in many application scenarios. Among numerous 3D face recognition methods, deep-learning-based methods have developed vigorously in recent years. In this paper, an end-to-end deep learning network entitled Sur3dNet-Face for point-cloud-based 3D face recognition is proposed. The network uses PointNet as the backbone, which is a successful point cloud classification solution but does not work properly in face recognition. Supplemented with modifications in network architecture and a few-data guided learning framework based on Gaussian process morphable model, the backbone is successfully modified for 3D face recognition. Different from existing methods training with a large amount of data in multiple datasets, our method uses Spring2003 subset of FRGC v2.0 for training which contains only 943 facial scans, and the network is well trained with the guidance of such a small amount of real data. Without fine-tuning on the test set, the Rank-1 Recognition Rate (RR1) is achieved as follows: 98.85% on FRGC v2.0 dataset and 99.33% on Bosphorus dataset, which proves the effectiveness and the potentiality of our method.

* 9 pages, 5 figures 

Analyzing the Impact of Shape & Context on the Face Recognition Performance of Deep Networks

Aug 05, 2022
Sandipan Banerjee, Walter Scheirer, Kevin Bowyer, Patrick Flynn

In this article, we analyze how changing the underlying 3D shape of the base identity in face images can distort their overall appearance, especially from the perspective of deep face recognition. As done in popular training data augmentation schemes, we graphically render real and synthetic face images with randomly chosen or best-fitting 3D face models to generate novel views of the base identity. We compare deep features generated from these images to assess the perturbation these renderings introduce into the original identity. We perform this analysis at various degrees of facial yaw with the base identities varying in gender and ethnicity. Additionally, we investigate if adding some form of context and background pixels in these rendered images, when used as training data, further improves the downstream performance of a face recognition model. Our experiments demonstrate the significance of facial shape in accurate face matching and underpin the importance of contextual data for network training.


Detection of Face using Viola Jones and Recognition using Back Propagation Neural Network

Jan 28, 2017
Smriti Tikoo, Nitin Malik

Detection and recognition of the facial images of people is an intricate problem which has garnered much attention during recent years due to its ever increasing applications in numerous fields. It continues to pose a challenge in finding a robust solution to it. Its scope extends to catering the security, commercial and law enforcement applications. Research for moreover a decade on this subject has brought about remarkable development with the modus operandi like human computer interaction, biometric analysis and content based coding of images, videos and surveillance. A trivial task for brain but cumbersome to be imitated artificially. The commonalities in faces does pose a problem on various grounds but features such as skin color, gender differentiate a person from the other. In this paper the facial detection has been carried out using Viola-Jones algorithm and recognition of face has been done using Back Propagation Neural Network (BPNN).

* Int J. Computer Science and Mobile Computing, vol. 5, issue 5, pp. 288-295 (May 2016) 
* ISSN 2320-088X, 8 pages, 5 figures, 1 table 

Skeleton Based Sign Language Recognition Using Whole-body Keypoints

Mar 16, 2021
Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu

Sign language is a visual language that is used by deaf or speech impaired people to communicate with each other. Sign language is always performed by fast transitions of hand gestures and body postures, requiring a great amount of knowledge and training to understand it. Sign language recognition becomes a useful yet challenging task in computer vision. Skeleton-based action recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. However, skeleton-based recognition can hardly be applied to sign language recognition tasks, majorly because skeleton data contains no indication of hand gestures or facial expressions. Inspired by the recent development of whole-body pose estimation \cite{jin2020whole}, we propose recognizing sign language based on the whole-body key points and features. The recognition results are further ensembled with other modalities of RGB and optical flows to improve the accuracy further. In the challenge about isolated sign language recognition hosted by ChaLearn using a new large-scale multi-modal Turkish Sign Language dataset (AUTSL). Our method achieved leading accuracy in both the development phase and test phase. This manuscript is a fact sheet version. Our workshop paper version will be released soon. Our code has been made available at

* This submission is a preprint fact sheet version of our work at CVPR2021 Challenge on Looking at People Large Scale Signer Independent Isolated SLR 

A Deeper Look at Facial Expression Dataset Bias

Apr 25, 2019
Shan Li, Weihong Deng

Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions. To look deeper into this bias, we first conduct comprehensive experiments on dataset recognition and crossdataset generalization tasks, and for the first time explore the intrinsic causes of the dataset discrepancy. The results quantitatively verify that current datasets have a strong buildin bias and corresponding analyses indicate that the conditional probability distributions between source and target datasets are different. However, previous researches are mainly based on shallow features with limited discriminative ability under the assumption that the conditional distribution remains unchanged across domains. To address these issues, we further propose a novel deep Emotion-Conditional Adaption Network (ECAN) to learn domain-invariant and discriminative feature representations, which can match both the marginal and the conditional distributions across domains simultaneously. In addition, the largely ignored expression class distribution bias is also addressed by a learnable re-weighting parameter, so that the training and testing domains can share similar class distribution. Extensive cross-database experiments on both lab-controlled datasets (CK+, JAFFE, MMI and Oulu-CASIA) and real-world databases (AffectNet, FER2013, RAF-DB 2.0 and SFEW 2.0) demonstrate that our ECAN can yield competitive performances across various facial expression transfer tasks and outperform the state-of-theart methods.