We have developed convolutional neural networks (CNN) for a facial expression recognition task. The goal is to classify each facial image into one of the seven facial emotion categories considered in this study. We trained CNN models with different depth using gray-scale images. We developed our models in Torch and exploited Graphics Processing Unit (GPU) computation in order to expedite the training process. In addition to the networks performing based on raw pixel data, we employed a hybrid feature strategy by which we trained a novel CNN model with the combination of raw pixel data and Histogram of Oriented Gradients (HOG) features. To reduce the overfitting of the models, we utilized different techniques including dropout and batch normalization in addition to L2 regularization. We applied cross validation to determine the optimal hyper-parameters and evaluated the performance of the developed models by looking at their training histories. We also present the visualization of different layers of a network to show what features of a face can be learned by CNN models.
Nonnegative Matrix Factorization (NMF) is an unsupervised learning algorithm that produces a linear, parts-based approximation of a data matrix. NMF constructs a nonnegative low rank basis matrix and a nonnegative low rank matrix of weights which, when multiplied together, approximate the data matrix of interest using some cost function. The NMF algorithm can be modified to include auxiliary constraints which impose task-specific penalties or restrictions on the cost function of the matrix factorization. In this paper we propose a new NMF algorithm that makes use of non-data dependent auxiliary constraints which incorporate a Toeplitz matrix into the multiplicative updating of the basis and weight matrices. We compare the facial recognition performance of our new Toeplitz Nonnegative Matrix Factorization (TNMF) algorithm to the performance of the Zellner Nonnegative Matrix Factorization (ZNMF) algorithm which makes use of data-dependent auxiliary constraints. We also compare the facial recognition performance of the two aforementioned algorithms with the performance of several preexisting constrained NMF algorithms that have non-data-dependent penalties. The facial recognition performances are evaluated using the Cambridge ORL Database of Faces and the Yale Database of Faces.
Spontaneous subtle emotions are expressed through micro-expressions, which are tiny, sudden and short-lived dynamics of facial muscles; thus poses a great challenge for visual recognition. The abrupt but significant dynamics for the recognition task are temporally sparse while the rest, irrelevant dynamics, are temporally redundant. In this work, we analyze and enforce sparsity constrains to learn significant temporal and spectral structures while eliminate irrelevant facial dynamics of micro-expressions, which would ease the challenge in the visual recognition of spontaneous subtle emotions. The hypothesis is confirmed through experimental results of automatic spontaneous subtle emotion recognition with several sparsity levels on CASME II and SMIC, the only two publicly available spontaneous subtle emotion databases. The overall performances of the automatic subtle emotion recognition are boosted when only significant dynamics are preserved from the original sequences.
Commercial application of facial recognition demands robustness to a variety of challenges such as illumination, occlusion, spoofing, disguise, etc. Disguised face recognition is one of the emerging issues for access control systems, such as security checkpoints at the borders. However, the lack of availability of face databases with a variety of disguise addons limits the development of academic research in the area. In this paper, we present a multimodal disguised face dataset to facilitate the disguised face recognition research. The presented database contains 8 facial add-ons and 7 additional combinations of these add-ons to create a variety of disguised face images. Each facial image is captured in visible, visible plus infrared, infrared, and thermal spectra. Specifically, the database contains 100 subjects divided into subset-A (30 subjects, 1 image per modality) and subset-B (70 subjects, 5 plus images per modality). We also present baseline face detection results performed on the proposed database to provide reference results and compare the performance in different modalities. Qualitative and quantitative analysis is performed to evaluate the challenging nature of disguise addons. The dataset will be publicly available with the acceptance of the research article. The database is available at: https://github.com/usmancheema89/SejongFaceDatabase.
For applications such as airport border control, biometric technologies that can process many capture subjects quickly, efficiently, with weak supervision, and with minimal discomfort are desirable. Facial recognition is particularly appealing because it is minimally invasive yet offers relatively good recognition performance. Unfortunately, the combination of weak supervision and minimal invasiveness makes even highly accurate facial recognition systems susceptible to spoofing via presentation attacks. Thus, there is great demand for an effective and low cost system capable of rejecting such attacks.To this end we introduce PARAPH -- a novel hardware extension that exploits different measurements of light polarization to yield an image space in which presentation media are readily discernible from Bona Fide facial characteristics. The PARAPH system is inexpensive with an added cost of less than 10 US dollars. The system makes two polarization measurements in rapid succession, allowing them to be approximately pixel-aligned, with a frame rate limited by the camera, not the system. There are no moving parts above the molecular level, due to the efficient use of twisted nematic liquid crystals. We present evaluation images using three presentation attack media next to an actual face -- high quality photos on glossy and matte paper and a video of the face on an LCD. In each case, the actual face in the image generated by PARAPH is structurally discernible from the presentations, which appear either as noise (print attacks) or saturated images (replay attacks).
COVID-19 pandemic and social distancing urge a reliable human face recognition system in different abnormal situations. However, there is no research which studies the influence of glass factor in facial recognition system. This paper provides a comprehensive review of glass factor. The study contains two steps: data collection and accuracy test. Data collection includes collecting human face images through different situations, such as clear glasses, glass with water and glass with mist. Based on the collected data, an existing state-of-the-art face detection and recognition system built upon MTCNN and Inception V1 deep nets is tested for further analysis. Experimental data supports that 1) the system is robust for classification when comparing real-time images and 2) it fails at determining if two images are of same person by comparing real-time disturbed image with the frontal ones.
In this paper, an effective pipeline to automatic 4D Facial Expression Recognition (4D FER) is proposed. It combines two growing but disparate ideas in Computer Vision -- computing the spatial facial deformations using tools from Riemannian geometry and magnifying them using temporal filtering. The flow of 3D faces is first analyzed to capture the spatial deformations based on the recently-developed Riemannian approach, where registration and comparison of neighboring 3D faces are led jointly. Then, the obtained temporal evolution of these deformations are fed into a magnification method in order to amplify the facial activities over the time. The latter, main contribution of this paper, allows revealing subtle (hidden) deformations which enhance the emotion classification performance. We evaluated our approach on BU-4DFE dataset, the state-of-art 94.18% average performance and an improvement that exceeds 10% in classification accuracy, after magnifying extracted geometric features (deformations), are achieved.
Automatic facial behavior analysis has a long history of studies in the intersection of computer vision, physiology and psychology. However it is only recently, with the collection of large-scale datasets and powerful machine learning methods such as deep neural networks, that automatic facial behavior analysis started to thrive. Three of its iconic tasks are automatic recognition of basic expressions (e.g. happy, sad, surprised), estimation of continuous emotions (e.g., valence and arousal), and detection of facial action units (activations of e.g. upper/inner eyebrows, nose wrinkles). Up until now these tasks have been mostly studied independently collecting a dataset for the task. We present the first and the largest study of all facial behaviour tasks learned jointly in a single multi-task, multi-domain and multi-label network, which we call FaceBehaviorNet. For this we utilize all publicly available datasets in the community (around 5M images) that study facial behaviour tasks in-the-wild. We demonstrate that training jointly an end-to-end network for all tasks has consistently better performance than training each of the single-task networks. Furthermore, we propose two simple strategies for coupling the tasks during training, co-annotation and distribution matching, and show the advantages of this approach. Finally we show that FaceBehaviorNet has learned features that encapsulate all aspects of facial behaviour, and can be successfully applied to perform tasks (compound emotion recognition) beyond the ones that it has been trained in a zero- and few-shot learning setting.
In this paper, we propose a new feature descriptor Cross-Centroid Ripple Pattern (CRIP) for facial expression recognition. CRIP encodes the transitional pattern of a facial expression by incorporating cross-centroid relationship between two ripples located at radius r1 and r2 respectively. These ripples are generated by dividing the local neighborhood region into subregions. Thus, CRIP has ability to preserve macro and micro structural variations in an extensive region, which enables it to deal with side views and spontaneous expressions. Furthermore, gradient information between cross centroid ripples provides strenght to captures prominent edge features in active patches: eyes, nose and mouth, that define the disparities between different facial expressions. Cross centroid information also provides robustness to irregular illumination. Moreover, CRIP utilizes the averaging behavior of pixels at subregions that yields robustness to deal with noisy conditions. The performance of proposed descriptor is evaluated on seven comprehensive expression datasets consisting of challenging conditions such as age, pose, ethnicity and illumination variations. The experimental results show that our descriptor consistently achieved better accuracy rate as compared to existing state-of-art approaches.
Recently, we have seen an increase in the global facial recognition market size. Despite significant advances in face recognition technology with the adoption of convolutional neural networks, there are still open challenges, as when there is makeup in the face. To address this challenge, we propose and evaluate the adoption of facial parts to fuse with current holistic representations. We propose two strategies of facial parts: one with four regions (left periocular, right periocular, nose and mouth) and another with three facial thirds (upper, middle and lower). Experimental results obtained in four public makeup face datasets and in a challenging cross-dataset protocol show that the fusion of deep features extracted of facial parts with holistic representation increases the accuracy of face verification systems and decreases the error rates, even without any retraining of the CNN models. Our proposed pipeline achieved state-of-the-art performance for the YMU dataset and competitive results for other three datasets (EMFD, FAM and M501).