Human beings displays their emotions using facial expressions. For human it is very easy to recognize those emotions but for computer it is very challenging. Facial expressions vary from person to person. Brightness, contrast and resolution of every random image is different. This is why recognizing facial expression is very difficult. The facial expression recognition is an active research area. In this project, we worked on recognition of seven basic human emotions. These emotions are angry, disgust, fear, happy, sad, surprise and neutral. Every image was first passed through face detection algorithm to include it in train dataset. As CNN requires large amount of data so we duplicated our data using various filter on each image. The system is trained using CNN architecture. Preprocessed images of size 80*100 is passed as input to the first layer of CNN. Three convolutional layers were used, each of which was followed by a pooling layer and then three dense layers. The dropout rate for dense layer was 20%. The model was trained by combination of two publicly available datasets JAFFED and KDEF. 90% of the data was used for training while 10% was used for testing. We achieved maximum accuracy of 78% using combined dataset.
Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.
Detecting facial action units (AU) is one of the fundamental steps in automatic recognition of facial expression of emotions and cognitive states. Though there have been a variety of approaches proposed for this task, most of these models are trained only for the specific target AUs, and as such they fail to easily adapt to the task of recognition of new AUs (i.e., those not initially used to train the target models). In this paper, we propose a deep learning approach for facial AU detection that can easily and in a fast manner adapt to a new AU or target subject by leveraging only a few labeled samples from the new task (either an AU or subject). To this end, we propose a modeling approach based on the notion of the model-agnostic meta-learning [C. Finn and Levine, 2017], originally proposed for the general image recognition/detection tasks (e.g., the character recognition from the Omniglot dataset). Specifically, each subject and/or AU is treated as a new learning task and the model learns to adapt based on the knowledge of the previous tasks (the AUs and subjects used to pre-train the target models). Thus, given a new subject or AU, this meta-knowledge (that is shared among training and test tasks) is used to adapt the model to the new task using the notion of deep learning and model-agnostic meta-learning. We show on two benchmark datasets (BP4D and DISFA) for facial AU detection that the proposed approach can be easily adapted to new tasks (AUs/subjects). Using only a few labeled examples from these tasks, the model achieves large improvements over the baselines (i.e., non-adapted models).
Facial expression has a significant role in analyzing human cognitive state. Deriving an accurate facial appearance representation is a critical task for an automatic facial expression recognition application. This paper provides a new feature descriptor named as Quadrilateral Senary bit Pattern for facial expression recognition. The QUEST pattern encoded the intensity changes by emphasizing the relationship between neighboring and reference pixels by dividing them into two quadrilaterals in a local neighborhood. Thus, the resultant gradient edges reveal the transitional variation information, that improves the classification rate by discriminating expression classes. Moreover, it also enhances the capability of the descriptor to deal with viewpoint variations and illumination changes. The trine relationship in a quadrilateral structure helps to extract the expressive edges and suppressing noise elements to enhance the robustness to noisy conditions. The QUEST pattern generates a six-bit compact code, which improves the efficiency of the FER system with more discriminability. The effectiveness of the proposed method is evaluated by conducting several experiments on four benchmark datasets: MMI, GEMEP-FERA, OULU-CASIA, and ISED. The experimental results show better performance of the proposed method as compared to existing state-art-the approaches.
This paper proposes a fusion-based gender recognition method which uses facial images as input. Firstly, this paper utilizes pre-processing and a landmark detection method in order to find the important landmarks of faces. Thereafter, four different frameworks are proposed which are inspired by state-of-the-art gender recognition systems. The first framework extracts features using Local Binary Pattern (LBP) and Principal Component Analysis (PCA) and uses back propagation neural network. The second framework uses Gabor filters, PCA, and kernel Support Vector Machine (SVM). The third framework uses lower part of faces as input and classifies them using kernel SVM. The fourth framework uses Linear Discriminant Analysis (LDA) in order to classify the side outline landmarks of faces. Finally, the four decisions of frameworks are fused using weighted voting. This paper takes advantage of both texture and geometrical information, the two dominant types of information in facial gender recognition. Experimental results show the power and effectiveness of the proposed method. This method obtains recognition rate of 94% for neutral faces of FEI face dataset, which is equal to state-of-the-art rate for this dataset.
Cross-database non-frontal expression recognition is a very meaningful but rather difficult subject in the fields of computer vision and affect computing. In this paper, we proposed a novel transductive deep transfer learning architecture based on widely used VGGface16-Net for this problem. In this framework, the VGGface16-Net is used to jointly learn an common optimal nonlinear discriminative features from the non-frontal facial expression samples between the source and target databases and then we design a novel transductive transfer layer to deal with the cross-database non-frontal facial expression classification task. In order to validate the performance of the proposed transductive deep transfer learning networks, we present extensive crossdatabase experiments on two famous available facial expression databases, namely the BU-3DEF and the Multi-PIE database. The final experimental results show that our transductive deep transfer network outperforms the state-of-the-art cross-database facial expression recognition methods.
Numerous recent studies have demonstrated how Deep Neural Network (DNN) classifiers can be fooled by adversarial examples, in which an attacker adds perturbations to an original sample, causing the classifier to misclassify the sample. Adversarial attacks that render DNNs vulnerable in real life represent a serious threat, given the consequences of improperly functioning autonomous vehicles, malware filters, or biometric authentication systems. In this paper, we apply Fast Gradient Sign Method to introduce perturbations to a facial image dataset and then test the output on a different classifier that we trained ourselves, to analyze transferability of this method. Next, we craft a variety of different attack algorithms on a facial image dataset, with the intention of developing untargeted black-box approaches assuming minimal adversarial knowledge, to further assess the robustness of DNNs in the facial recognition realm. We explore modifying single optimal pixels by a large amount, or modifying all pixels by a smaller amount, or combining these two attack approaches. While our single-pixel attacks achieved about a 15% average decrease in classifier confidence level for the actual class, the all-pixel attacks were more successful and achieved up to an 84% average decrease in confidence, along with an 81.6% misclassification rate, in the case of the attack that we tested with the highest levels of perturbation. Even with these high levels of perturbation, the face images remained fairly clearly identifiable to a human. We hope our research may help to advance the study of adversarial attacks on DNNs and defensive mechanisms to counteract them, particularly in the facial recognition domain.
Plenty of effective methods have been proposed for face recognition during the past decade. Although these methods differ essentially in many aspects, a common practice of them is to specifically align the facial area based on the prior knowledge of human face structure before feature extraction. In most systems, the face alignment module is implemented independently. This has actually caused difficulties in the designing and training of end-to-end face recognition models. In this paper we study the possibility of alignment learning in end-to-end face recognition, in which neither prior knowledge on facial landmarks nor artificially defined geometric transformations are required. Specifically, spatial transformer layers are inserted in front of the feature extraction layers in a Convolutional Neural Network (CNN) for face recognition. Only human identity clues are used for driving the neural network to automatically learn the most suitable geometric transformation and the most appropriate facial area for the recognition task. To ensure reproducibility, our model is trained purely on the publicly available CASIA-WebFace dataset, and is tested on the Labeled Face in the Wild (LFW) dataset. We have achieved a verification accuracy of 99.08\% which is comparable to state-of-the-art single model based methods.
Facial Expression Recognition(FER) is one of the most important topic in Human-Computer interactions(HCI). In this work we report details and experimental results about a facial expression recognition method based on state-of-the-art methods. We fine-tuned a SeNet deep learning architecture pre-trained on the well-known VGGFace2 dataset, on the AffWild2 facial expression recognition dataset. The main goal of this work is to define a baseline for a novel method we are going to propose in the near future. This paper is also required by the Affective Behavior Analysis in-the-wild (ABAW) competition in order to evaluate on the test set this approach. The results reported here are on the validation set and are related on the Expression Challenge part (seven basic emotion recognition) of the competition. We will update them as soon as the actual results on the test set will be published on the leaderboard.
The activations of Facial Action Units (AUs) mutually influence one another. While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display. This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs of the target facial display. Our approach first encodes each AU's activation status and its association with other AUs into a node feature. Then, it learns a pair of multi-dimensional edge features to describe multiple task-specific relationship cues between each pair of AUs. During both node and edge feature learning, our approach also considers the influence of the unique facial display on AUs' relationship by taking the full face representation as an input. Experimental results on BP4D and DISFA datasets show that both node and edge feature learning modules provide large performance improvements for CNN and transformer-based backbones, with our best systems achieving the state-of-the-art AU recognition results. Our approach not only has a strong capability in modelling relationship cues for AU recognition but also can be easily incorporated into various backbones. Our PyTorch code is made available.