Facial landmark detection, head pose estimation, and facial deformation analysis are typical facial behavior analysis tasks in computer vision. The existing methods usually perform each task independently and sequentially, ignoring their interactions. To tackle this problem, we propose a unified framework for simultaneous facial landmark detection, head pose estimation, and facial deformation analysis, and the proposed model is robust to facial occlusion. Following a cascade procedure augmented with model-based head pose estimation, we iteratively update the facial landmark locations, facial occlusion, head pose and facial de- formation until convergence. The experimental results on benchmark databases demonstrate the effectiveness of the proposed method for simultaneous facial landmark detection, head pose and facial deformation estimation, even if the images are under facial occlusion.
Human face analysis is an important task in computer vision. According to cognitive-psychological studies, facial dynamics could provide crucial cues for face analysis. The motion of a facial local region in facial expression is related to the motion of other facial local regions. In this paper, a novel deep learning approach, named facial dynamics interpreter network, has been proposed to interpret the important relations between local dynamics for estimating facial traits from expression sequence. The facial dynamics interpreter network is designed to be able to encode a relational importance, which is used for interpreting the relation between facial local dynamics and estimating facial traits. By comparative experiments, the effectiveness of the proposed method has been verified. The important relations between facial local dynamics are investigated by the proposed facial dynamics interpreter network in gender classification and age estimation. Moreover, experimental results show that the proposed method outperforms the state-of-the-art methods in gender classification and age estimation.
Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection technique is incorporated into our model, where AUs refer to a unique set of facial muscle movements used to describe almost every anatomically possible facial expression. In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity. ALGRNet mainly consists of three main novel structures: (i) an adaptive region learning module that learns the adaptive muscle regions based on the detected landmarks; (ii) a skip-BiLSTM that models the latent relationships among local AUs; and (iii) a feature fusion&refining module that investigates the complementary between the local and global face. Quantitative results on two AU benchmarks, i.e., BP4D and DISFA, demonstrate our ALGRNet can achieve promising AU detection accuracy. We further demonstrate the effectiveness of its application to facial paralysis estimation by migrating ALGRNet to a facial paralysis dataset collected and annotated by medical professionals.
Meaningful facial parts can convey key cues for both facial action unit detection and expression prediction. Textured 3D face scan can provide both detailed 3D geometric shape and 2D texture appearance cues of the face which are beneficial for Facial Expression Recognition (FER). However, accurate facial parts extraction as well as their fusion are challenging tasks. In this paper, a novel system for 3D FER is designed based on accurate facial parts extraction and deep feature fusion of facial parts. In particular, each textured 3D face scan is firstly represented as a 2D texture map and a depth map with one-to-one dense correspondence. Then, the facial parts of both texture map and depth map are extracted using a novel 4-stage process consists of facial landmark localization, facial rotation correction, facial resizing, facial parts bounding box extraction and post-processing procedures. Finally, deep fusion Convolutional Neural Networks (CNNs) features of all facial parts are learned from both texture maps and depth maps, respectively and nonlinear SVMs are used for expression prediction. Experiments are conducted on the BU-3DFE database, demonstrating the effectiveness of combing different facial parts, texture and depth cues and reporting the state-of-the-art results in comparison with all existing methods under the same setting.
Previous approaches to model and analyze facial expression analysis use three different techniques: facial action units, geometric features and graph based modelling. However, previous approaches have treated these technique separately. There is an interrelationship between these techniques. The facial expression analysis is significantly improved by utilizing these mappings between major geometric features involved in facial expressions and the subset of facial action units whose presence or absence are unique to a facial expression. This paper combines dimension reduction techniques and image classification with search space pruning achieved by this unique subset of facial action units to significantly prune the search space. The performance results on the publicly facial expression database shows an improvement in performance by 70% over time while maintaining the emotion recognition correctness.
Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years. Most of the existing deep learning based FER methods do not consider domain knowledge well, which thereby fail to extract representative features. In this work, we propose a novel FER framework, named Facial Motion Prior Networks (FMPN). Particularly, we introduce an addition branch to generate a facial mask so as to focus on facial muscle moving regions. To guide the facial mask learning, we propose to incorporate prior domain knowledge by using the average differences between neutral faces and the corresponding expressive faces as the guidance. Extensive experiments on four facial expression benchmark datasets demonstrate the effectiveness of the proposed method, compared with the state-of-the-art approaches.
Extraction of discriminative features from salient facial patches plays a vital role in effective facial expression recognition. The accurate detection of facial landmarks improves the localization of the salient patches on face images. This paper proposes a novel framework for expression recognition by using appearance features of selected facial patches. A few prominent facial patches, depending on the position of facial landmarks, are extracted which are active during emotion elicitation. These active patches are further processed to obtain the salient patches which contain discriminative features for classification of each pair of expressions, thereby selecting different facial patches as salient for different pair of expression classes. One-against-one classification method is adopted using these features. In addition, an automated learning-free facial landmark detection technique has been proposed, which achieves similar performances as that of other state-of-art landmark detection methods, yet requires significantly less execution time. The proposed method is found to perform well consistently in different resolutions, hence, providing a solution for expression recognition in low resolution images. Experiments on CK+ and JAFFE facial expression databases show the effectiveness of the proposed system.
Although much progress has been made in the facial expression analysis field, facial occlusions are still challenging. The main innovation brought by this contribution consists in exploiting the specificities of facial movement propagation for recognizing expressions in presence of important occlusions. The movement induced by an expression extends beyond the movement epicenter. Thus, the movement occurring in an occluded region propagates towards neighboring visible regions. In presence of occlusions, per expression, we compute the importance of each unoccluded facial region and we construct adapted facial frameworks that boost the performance of per expression binary classifier. The output of each expression-dependant binary classifier is then aggregated and fed into a fusion process that aims constructing, per occlusion, a unique model that recognizes all the facial expressions considered. The evaluations highlight the robustness of this approach in presence of significant facial occlusions.
Cascade regression framework has been shown to be effective for facial landmark detection. It starts from an initial face shape and gradually predicts the face shape update from the local appearance features to generate the facial landmark locations in the next iteration until convergence. In this paper, we improve upon the cascade regression framework and propose the Constrained Joint Cascade Regression Framework (CJCRF) for simultaneous facial action unit recognition and facial landmark detection, which are two related face analysis tasks, but are seldomly exploited together. In particular, we first learn the relationships among facial action units and face shapes as a constraint. Then, in the proposed constrained joint cascade regression framework, with the help from the constraint, we iteratively update the facial landmark locations and the action unit activation probabilities until convergence. Experimental results demonstrate that the intertwined relationships of facial action units and face shapes boost the performances of both facial action unit recognition and facial landmark detection. The experimental results also demonstrate the effectiveness of the proposed method comparing to the state-of-the-art works.
One-sided facial paralysis causes uneven movements of facial muscles on the sides of the face. Physicians currently assess facial asymmetry in a subjective manner based on their clinical experience. This paper proposes a novel method to provide an objective and quantitative asymmetry score for frontal faces. Our metric has the potential to help physicians for diagnosis as well as monitoring the rehabilitation of patients with one-sided facial paralysis. A deep learning based landmark detection technique is used to estimate style invariant facial landmark points and dense optical flow is used to generate motion maps from a short sequence of frames. Six face regions are considered corresponding to the left and right parts of the forehead, eyes, and mouth. Motion is computed and compared between the left and the right parts of each region of interest to estimate the symmetry score. For testing, asymmetric sequences are synthetically generated from a facial expression dataset. A score equation is developed to quantify symmetry in both symmetric and asymmetric face sequences.