Human emotions can be inferred from facial expressions. However, the annotations of facial expressions are often highly noisy in common emotion coding models, including categorical and dimensional ones. To reduce human labelling effort on multi-task labels, we introduce a new problem of facial emotion recognition with noisy multi-task annotations. For this new problem, we suggest a formulation from the point of joint distribution match view, which aims at learning more reliable correlations among raw facial images and multi-task labels, resulting in the reduction of noise influence. In our formulation, we exploit a new method to enable the emotion prediction and the joint distribution learning in a unified adversarial learning game. Evaluation throughout extensive experiments studies the real setups of the suggested new problem, as well as the clear superiority of the proposed method over the state-of-the-art competing methods on either the synthetic noisy labeled CIFAR-10 or practical noisy multi-task labeled RAF and AffectNet.
Temporal information can provide useful features for recognizing facial expressions. However, to manually design useful features requires a lot of effort. In this paper, to reduce this effort, a deep learning technique which is regarded as a tool to automatically extract useful features from raw data, is adopted. Our deep network is based on two different models. The first deep network extracts temporal geometry features from temporal facial landmark points, while the other deep network extracts temporal appearance features from image sequences . These two models are combined in order to boost the performance of the facial expression recognition. Through several experiments, we showed that the two models cooperate with each other. As a result, we achieved superior performance to other state-of-the-art methods in CK+ and Oulu-CASIA databases. Furthermore, one of the main contributions of this paper is that our deep network catches the facial action points automatically.
In recent years, periocular recognition has been developed as a valuable biometric identification approach, especially in wild environments (for example, masked faces due to COVID-19 pandemic) where facial recognition may not be applicable. This paper presents a new deep periocular recognition framework called attribute-based deep periocular recognition (ADPR), which predicts soft biometrics and incorporates the prediction into a periocular recognition algorithm to determine identity from periocular images with high accuracy. We propose an end-to-end framework, which uses several shared convolutional neural network (CNN)layers (a common network) whose output feeds two separate dedicated branches (modality dedicated layers); the first branch classifies periocular images while the second branch predicts softn biometrics. Next, the features from these two branches are fused together for a final periocular recognition. The proposed method is different from existing methods as it not only uses a shared CNN feature space to train these two tasks jointly, but it also fuses predicted soft biometric features with the periocular features in the training step to improve the overall periocular recognition performance. Our proposed model is extensively evaluated using four different publicly available datasets. Experimental results indicate that our soft biometric based periocular recognition approach outperforms other state-of-the-art methods for periocular recognition in wild environments.
It is human nature to give prime importance to facial appearances. Often, to look good is to feel good. Also, facial features are unique to every individual on this planet, which means it is a source of vital information. This work proposes a framework named E-Pro for the detection and recognition of faces by taking facial images as inputs. E-Pro has its potential application in various domains, namely attendance, surveillance, crowd monitoring, biometric-based authentication etc. E-Pro is developed here as a mobile application that aims to aid lecturers to mark attendance in a classroom by detecting and recognizing the faces of students from a picture clicked through the app. E-Pro has been developed using Google Firebase Face Recognition APIs, which uses Euler Angles, and Probabilistic Model. E-Pro has been tested on stock images and the experimental results are promising.
This study demonstrates how facial biometrics, acquired using multi-spectral sensors, such as RGB, depth, and infrared, assist the data accumulation in the process of authorizing users of automated and semi-automated access systems. This data serves the purposes of person authentication, as well as facial temperature estimation. We utilize depth data taken using an inexpensive RGB-D sensor to find the head pose of a subject. This allows the selection of video frames containing a frontal-view head pose for face recognition and face temperature reading. Usage of the frontal-view frames improves the efficiency of face recognition while the corresponding synchronized IR video frames allow for more efficient temperature estimation for facial regions of interest. In addition, this study reports emerging applications of biometrics in biomedical and health care solutions. Including surveys of recent pilot projects, involving new sensors of biometric data and new applications of human physiological and behavioral biometrics. It also shows the new and promising horizons of using biometrics in natural and contactless control interfaces for surgical control, rehabilitation and accessibility.
Emotion recognition has become an important field of research in the human-computer interactions domain. The latest advancements in the field show that combining visual with audio information lead to better results if compared to the case of using a single source of information separately. From a visual point of view, a human emotion can be recognized by analyzing the facial expression of the person. More precisely, the human emotion can be described through a combination of several Facial Action Units. In this paper, we propose a system that is able to recognize emotions with a high accuracy rate and in real time, based on deep Convolutional Neural Networks. In order to increase the accuracy of the recognition system, we analyze also the speech data and fuse the information coming from both sources, i.e., visual and audio. Experimental results show the effectiveness of the proposed scheme for emotion recognition and the importance of combining visual with audio data.
Depth information has been proven useful for face recognition. However, existing depth-image-based face recognition methods still suffer from noisy depth values and varying poses and expressions. In this paper, we propose a novel method for normalizing facial depth images to frontal pose and neutral expression and extracting robust features from the normalized depth images. The method is implemented via two deep convolutional neural networks (DCNN), normalization network ($Net_{N}$) and feature extraction network ($Net_{F}$). Given a facial depth image, $Net_{N}$ first converts it to an HHA image, from which the 3D face is reconstructed via a DCNN. $Net_{N}$ then generates a pose-and-expression normalized (PEN) depth image from the reconstructed 3D face. The PEN depth image is finally passed to $Net_{F}$, which extracts a robust feature representation via another DCNN for face recognition. Our preliminary evaluation results demonstrate the superiority of the proposed method in recognizing faces of arbitrary poses and expressions with depth images.
Facial emotion recognition is an essential and important aspect of the field of human-machine interaction. Past research on facial emotion recognition focuses on the laboratory environment. However, it faces many challenges in real-world conditions, i.e., illumination changes, large pose variations and partial or full occlusions. Those challenges lead to different face areas with different degrees of sharpness and completeness. Inspired by this fact, we focus on the authenticity of predictions generated by different pairs. For example, if only the mouth areas are available and the emotion classifier predicts happiness, then there is a question of how to judge the authenticity of predictions. This problem can be converted into the contribution of different face areas to different emotions. In this paper, we divide the whole face into six areas: nose areas, mouth areas, eyes areas, nose to mouth areas, nose to eyes areas and mouth to eyes areas. To obtain more convincing results, our experiments are conducted on three different databases: facial expression recognition + ( FER+), real-world affective faces database (RAF-DB) and expression in-the-wild (ExpW) dataset. Through analysis of the classification accuracy, the confusion matrix and the class activation map (CAM), we can establish convincing results. To sum up, the contributions of this paper lie in two areas: 1) We visualize concerned areas of human faces in emotion recognition; 2) We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis. Our findings can be combined with findings in psychology to promote the understanding of emotional expressions.
Recognizing a face based on its attributes is an easy task for a human to perform as it is a cognitive process. In recent years, Face Recognition is achieved with different kinds of facial features which were used separately or in a combined manner. Currently, Feature fusion methods and parallel methods are the facial features used and performed by integrating multiple feature sets at different levels. However, this integration and the combinational methods do not guarantee better result. Hence to achieve better results, the feature fusion model with multiple weighted facial attribute set is selected. For this feature model, face images from predefined data set has been taken from Olivetti Research Laboratory (ORL) and applied on different methods like Principal Component Analysis (PCA) based Eigen feature extraction technique, Discrete Cosine Transformation (DCT) based feature extraction technique, Histogram Based Feature Extraction technique and Simple Intensity based features. The extracted feature set obtained from these methods were compared and tested for accuracy. In this work we have developed a model which will use the above set of feature extraction techniques with different levels of weights to attain better accuracy. The results show that the selection of optimum weight for a particular feature will lead to improvement in recognition rate.
Recent advances in domain adaptation, especially those applied to heterogeneous facial recognition, typically rely upon restrictive Euclidean loss functions (e.g., $L_2$ norm) which perform best when images from two different domains (e.g., visible and thermal) are co-registered and temporally synchronized. This paper proposes a novel domain adaptation framework that combines a new feature mapping sub-network with existing deep feature models, which are based on modified network architectures (e.g., VGG16 or Resnet50). This framework is optimized by introducing new cross-domain identity and domain invariance loss functions for thermal-to-visible face recognition, which alleviates the requirement for precisely co-registered and synchronized imagery. We provide extensive analysis of both features and loss functions used, and compare the proposed domain adaptation framework with state-of-the-art feature based domain adaptation models on a difficult dataset containing facial imagery collected at varying ranges, poses, and expressions. Moreover, we analyze the viability of the proposed framework for more challenging tasks, such as non-frontal thermal-to-visible face recognition.