Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Shared Representation Learning for Heterogeneous Face Recognition

Jun 05, 2014
Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li

After intensive research, heterogenous face recognition is still a challenging problem. The main difficulties are owing to the complex relationship between heterogenous face image spaces. The heterogeneity is always tightly coupled with other variations, which makes the relationship of heterogenous face images highly nonlinear. Many excellent methods have been proposed to model the nonlinear relationship, but they apt to overfit to the training set, due to limited samples. Inspired by the unsupervised algorithms in deep learning, this paper proposes an novel framework for heterogeneous face recognition. We first extract Gabor features at some localized facial points, and then use Restricted Boltzmann Machines (RBMs) to learn a shared representation locally to remove the heterogeneity around each facial point. Finally, the shared representations of local RBMs are connected together and processed by PCA. Two problems (Sketch-Photo and NIR-VIS) and three databases are selected to evaluate the proposed method. For Sketch-Photo problem, we obtain perfect results on the CUFS database. For NIR-VIS problem, we produce new state-of-the-art performance on the CASIA HFB and NIR-VIS 2.0 databases.

Access Paper or Ask Questions

Predicting Performance of a Face Recognition System Based on Image Quality

Oct 24, 2015
Abhishek Dutta

In this dissertation, we present a generative model to capture the relation between facial image quality features (like pose, illumination direction, etc) and face recognition performance. Such a model can be used to predict the performance of a face recognition system. Since the model is based solely on image quality features, performance predictions can be done even before the actual recognition has taken place thereby facilitating many preemptive action. A practical limitation of such a data driven generative model is the limited nature of training data set. To address this limitation, we have developed a Bayesian approach to model the distribution of recognition performance measure based on the number of match and non-match scores in small regions of the image quality space. Random samples drawn from these models provide the initial data essential for training the generative model. Experiment results based on six face recognition systems operating on three independent data sets show that the proposed performance prediction model can accurately predict face recognition performance using an accurate and unbiased Image Quality Assessor (IQA). Furthermore, our results show that variability in the unaccounted quality space -- the image quality features not considered by the IQA -- is the major factor causing inaccuracies in predicted performance.

* PhD thesis publicly defended at the University of Twente (Netherlands) on April 24, 2015 at 12.45 
Access Paper or Ask Questions

Gaze-enhanced Crossmodal Embeddings for Emotion Recognition

Apr 30, 2022
Ahmed Abdou, Ekta Sood, Philipp Müller, Andreas Bulling

Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e.g. speech during a phone call. While previous work proposed crossmodal emotion embeddings to improve monomodal recognition performance, despite its importance, an explicit representation of gaze was not included. We propose a new approach to emotion recognition that incorporates an explicit representation of gaze in a crossmodal emotion embedding framework. We show that our method outperforms the previous state of the art for both audio-only and video-only emotion classification on the popular One-Minute Gradual Emotion Recognition dataset. Furthermore, we report extensive ablation experiments and provide detailed insights into the performance of different state-of-the-art gaze representations and integration strategies. Our results not only underline the importance of gaze for emotion recognition but also demonstrate a practical and highly effective approach to leveraging gaze information for this task.

Access Paper or Ask Questions

Cross-Database Micro-Expression Recognition: A Benchmark

Dec 19, 2018
Yuan Zong, Wenming Zheng, Xiaopeng Hong, Chuangao Tang, Zhen Cui, Guoying Zhao

Cross-database micro-expression recognition (CDMER) is one of recently emerging and interesting problem in micro-expression analysis. CDMER is more challenging than the conventional micro-expression recognition (MER), because the training and testing samples in CDMER come from different micro-expression databases, resulting in the inconsistency of the feature distributions between the training and testing sets. In this paper, we contribute to this topic from three aspects. First, we establish a CDMER experimental evaluation protocol aiming to allow the researchers to conveniently work on this topic and provide a standard platform for evaluating their proposed methods. Second, we conduct benchmark experiments by using NINE state-of-the-art domain adaptation (DA) methods and SIX popular spatiotemporal descriptors for respectively investigating CDMER problem from two different perspectives. Third, we propose a novel DA method called region selective transfer regression (RSTR) to deal with the CDMER task. Our RSTR takes advantage of one important cue for recognizing micro-expressions, i.e., the different contributions of the facial local regions in MER. The overall superior performance of RSTR demonstrates that taking into consideration the important cues benefiting MER, e.g., the facial local region information, contributes to develop effective DA methods for dealing with CDMER problem.

* 13 pages 
Access Paper or Ask Questions

Few-Data Guided Learning Upon End-to-End Point Cloud Network for 3D Face Recognition

Mar 31, 2021
Yi Yu, Feipeng Da, Ziyu Zhang

3D face recognition has shown its potential in many application scenarios. Among numerous 3D face recognition methods, deep-learning-based methods have developed vigorously in recent years. In this paper, an end-to-end deep learning network entitled Sur3dNet-Face for point-cloud-based 3D face recognition is proposed. The network uses PointNet as the backbone, which is a successful point cloud classification solution but does not work properly in face recognition. Supplemented with modifications in network architecture and a few-data guided learning framework based on Gaussian process morphable model, the backbone is successfully modified for 3D face recognition. Different from existing methods training with a large amount of data in multiple datasets, our method uses Spring2003 subset of FRGC v2.0 for training which contains only 943 facial scans, and the network is well trained with the guidance of such a small amount of real data. Without fine-tuning on the test set, the Rank-1 Recognition Rate (RR1) is achieved as follows: 98.85% on FRGC v2.0 dataset and 99.33% on Bosphorus dataset, which proves the effectiveness and the potentiality of our method.

* 9 pages, 5 figures 
Access Paper or Ask Questions

Detection of Face using Viola Jones and Recognition using Back Propagation Neural Network

Jan 28, 2017
Smriti Tikoo, Nitin Malik

Detection and recognition of the facial images of people is an intricate problem which has garnered much attention during recent years due to its ever increasing applications in numerous fields. It continues to pose a challenge in finding a robust solution to it. Its scope extends to catering the security, commercial and law enforcement applications. Research for moreover a decade on this subject has brought about remarkable development with the modus operandi like human computer interaction, biometric analysis and content based coding of images, videos and surveillance. A trivial task for brain but cumbersome to be imitated artificially. The commonalities in faces does pose a problem on various grounds but features such as skin color, gender differentiate a person from the other. In this paper the facial detection has been carried out using Viola-Jones algorithm and recognition of face has been done using Back Propagation Neural Network (BPNN).

* Int J. Computer Science and Mobile Computing, vol. 5, issue 5, pp. 288-295 (May 2016) 
* ISSN 2320-088X, 8 pages, 5 figures, 1 table 
Access Paper or Ask Questions

Skeleton Based Sign Language Recognition Using Whole-body Keypoints

Mar 16, 2021
Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu

Sign language is a visual language that is used by deaf or speech impaired people to communicate with each other. Sign language is always performed by fast transitions of hand gestures and body postures, requiring a great amount of knowledge and training to understand it. Sign language recognition becomes a useful yet challenging task in computer vision. Skeleton-based action recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. However, skeleton-based recognition can hardly be applied to sign language recognition tasks, majorly because skeleton data contains no indication of hand gestures or facial expressions. Inspired by the recent development of whole-body pose estimation \cite{jin2020whole}, we propose recognizing sign language based on the whole-body key points and features. The recognition results are further ensembled with other modalities of RGB and optical flows to improve the accuracy further. In the challenge about isolated sign language recognition hosted by ChaLearn using a new large-scale multi-modal Turkish Sign Language dataset (AUTSL). Our method achieved leading accuracy in both the development phase and test phase. This manuscript is a fact sheet version. Our workshop paper version will be released soon. Our code has been made available at

* This submission is a preprint fact sheet version of our work at CVPR2021 Challenge on Looking at People Large Scale Signer Independent Isolated SLR 
Access Paper or Ask Questions

A Deeper Look at Facial Expression Dataset Bias

Apr 25, 2019
Shan Li, Weihong Deng

Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions. To look deeper into this bias, we first conduct comprehensive experiments on dataset recognition and crossdataset generalization tasks, and for the first time explore the intrinsic causes of the dataset discrepancy. The results quantitatively verify that current datasets have a strong buildin bias and corresponding analyses indicate that the conditional probability distributions between source and target datasets are different. However, previous researches are mainly based on shallow features with limited discriminative ability under the assumption that the conditional distribution remains unchanged across domains. To address these issues, we further propose a novel deep Emotion-Conditional Adaption Network (ECAN) to learn domain-invariant and discriminative feature representations, which can match both the marginal and the conditional distributions across domains simultaneously. In addition, the largely ignored expression class distribution bias is also addressed by a learnable re-weighting parameter, so that the training and testing domains can share similar class distribution. Extensive cross-database experiments on both lab-controlled datasets (CK+, JAFFE, MMI and Oulu-CASIA) and real-world databases (AffectNet, FER2013, RAF-DB 2.0 and SFEW 2.0) demonstrate that our ECAN can yield competitive performances across various facial expression transfer tasks and outperform the state-of-theart methods.

Access Paper or Ask Questions

Real-time Automatic Emotion Recognition from Body Gestures

Feb 20, 2014
Stefano Piana, Alessandra Staglianò, Francesca Odone, Alessandro Verri, Antonio Camurri

Although psychological research indicates that bodily expressions convey important affective information, to date research in emotion recognition focused mainly on facial expression or voice analysis. In this paper we propose an approach to realtime automatic emotion recognition from body movements. A set of postural, kinematic, and geometrical features are extracted from sequences 3D skeletons and fed to a multi-class SVM classifier. The proposed method has been assessed on data acquired through two different systems: a professionalgrade optical motion capture system, and Microsoft Kinect. The system has been assessed on a "six emotions" recognition problem, and using a leave-one-subject-out cross validation strategy, reached an overall recognition rate of 61.3% which is very close to the recognition rate of 61.9% obtained by human observers. To provide further testing of the system, two games were developed, where one or two users have to interact to understand and express emotions with their body.

Access Paper or Ask Questions

End-to-end facial and physiological model for Affective Computing and applications

Jan 20, 2020
Joaquim Comas, Decky Aspandi, Xavier Binefa

In recent years, Affective Computing and its applications have become a fast-growing research topic. Furthermore, the rise of Deep Learning has introduced significant improvements in the emotion recognition system compared to classical methods. In this work, we propose a multi-modal emotion recognition model based on deep learning techniques using the combination of peripheral physiological signals and facial expressions. Moreover, we present an improvement to proposed models by introducing latent features extracted from our internal Bio Auto-Encoder (BAE). Both models are trained and evaluated on AMIGOS datasets reporting valence, arousal, and emotion state classification. Finally, to demonstrate a possible medical application in affective computing using deep learning techniques, we applied the proposed method to the assessment of anxiety therapy. To this purpose, a reduced multi-modal database has been collected by recording facial expressions and peripheral signals such as Electrocardiogram (ECG) and Galvanic Skin Response (GSR) of each patient. Valence and arousal estimation was extracted using the proposed model from the beginning until the end of the therapy, with successful evaluation to the different emotional changes in the temporal domain.

Access Paper or Ask Questions