Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Emotive Response to a Hybrid-Face Robot and Translation to Consumer Social Robots

Dec 08, 2020
Maitreyee Wairagkar, Maria R Lima, Daniel Bazo, Richard Craig, Hugo Weissbart, Appolinaire C Etoundi, Tobias Reichenbach, Prashant Iyenger, Sneh Vaswani, Christopher James, Payam Barnaghi, Chris Melhuish, Ravi Vaidyanathan

We introduce the conceptual formulation, design, fabrication, control and commercial translation with IoT connection of a hybrid-face social robot and validation of human emotional response to its affective interactions. The hybrid-face robot integrates a 3D printed faceplate and a digital display to simplify conveyance of complex facial movements while providing the impression of three-dimensional depth for natural interaction. We map the space of potential emotions of the robot to specific facial feature parameters and characterise the recognisability of the humanoid hybrid-face robot's archetypal facial expressions. We introduce pupil dilation as an additional degree of freedom for conveyance of emotive states. Human interaction experiments demonstrate the ability to effectively convey emotion from the hybrid-robot face to human observers by mapping their neurophysiological electroencephalography (EEG) response to perceived emotional information and through interviews. Results show main hybrid-face robotic expressions can be discriminated with recognition rates above 80% and invoke human emotive response similar to that of actual human faces as measured by the face-specific N170 event-related potentials in EEG. The hybrid-face robot concept has been modified, implemented, and released in the commercial IoT robotic platform Miko (My Companion), an affective robot with facial and conversational features currently in use for human-robot interaction in children by Emotix Inc. We demonstrate that human EEG responses to Miko emotions are comparative to neurophysiological responses for actual human facial recognition. Finally, interviews show above 90% expression recognition rates in our commercial robot. We conclude that simplified hybrid-face abstraction conveys emotions effectively and enhances human-robot interaction.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 
Access Paper or Ask Questions

UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition

Dec 13, 2017
Jiankang Deng, Shiyang Cheng, Niannan Xue, Yuxiang Zhou, Stefanos Zafeiriou

Recently proposed robust 3D face alignment methods establish either dense or sparse correspondence between a 3D face model and a 2D facial image. The use of these methods presents new challenges as well as opportunities for facial texture analysis. In particular, by sampling the image using the fitted model, a facial UV can be created. Unfortunately, due to self-occlusion, such a UV map is always incomplete. In this paper, we propose a framework for training Deep Convolutional Neural Network (DCNN) to complete the facial UV map extracted from in-the-wild images. To this end, we first gather complete UV maps by fitting a 3D Morphable Model (3DMM) to various multiview image and video datasets, as well as leveraging on a new 3D dataset with over 3,000 identities. Second, we devise a meticulously designed architecture that combines local and global adversarial DCNNs to learn an identity-preserving facial UV completion model. We demonstrate that by attaching the completed UV to the fitted mesh and generating instances of arbitrary poses, we can increase pose variations for training deep face recognition/verification models, and minimise pose discrepancy during testing, which lead to better performance. Experiments on both controlled and in-the-wild UV datasets prove the effectiveness of our adversarial UV completion model. We achieve state-of-the-art verification accuracy, $94.05\%$, under the CFP frontal-profile protocol only by combining pose augmentation during training and pose discrepancy reduction during testing. We will release the first in-the-wild UV dataset (we refer as WildUV) that comprises of complete facial UV maps from 1,892 identities for research purposes.

Access Paper or Ask Questions

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

Nov 10, 2015
Huibin Li, Jian Sun, Dong Wang, Zongben Xu, Liming Chen

In this paper, we present a novel approach to automatic 3D Facial Expression Recognition (FER) based on deep representation of facial 3D geometric and 2D photometric attributes. A 3D face is firstly represented by its geometric and photometric attributes, including the geometry map, normal maps, normalized curvature map and texture map. These maps are then fed into a pre-trained deep convolutional neural network to generate the deep representation. Then the facial expression prediction is simplyachieved by training linear SVMs over the deep representation for different maps and fusing these SVM scores. The visualizations show that the deep representation provides a complete and highly discriminative coding scheme for 3D faces. Comprehensive experiments on the BU-3DFE database demonstrate that the proposed deep representation can outperform the widely used hand-crafted descriptors (i.e., LBP, SIFT, HOG, Gabor) and the state-of-art approaches under the same experimental protocols.

Access Paper or Ask Questions

AI in Pursuit of Happiness, Finding Only Sadness: Multi-Modal Facial Emotion Recognition Challenge

Oct 24, 2019
Carl Norman

The importance of automated Facial Emotion Recognition (FER) grows the more common human-machine interactions become, which will only continue to increase dramatically with time. A common method to describe human sentiment or feeling is the categorical model the `7 basic emotions', consisting of `Angry', `Disgust', `Fear', `Happiness', `Sadness', `Surprise' and `Neutral'. The `Emotion Recognition in the Wild' (EmotiW) competition is now in its 7th year and has become the standard benchmark for measuring FER performance. The focus of this paper is the EmotiW sub-challenge of classifying videos in the `Acted Facial Expression in the Wild' (AFEW) dataset, consisting of both visual and audio modalities, into one of the above classes. Machine learning has exploded as a research topic in recent years, with advancements in `Deep Learning' a key part of this. Although Deep Learning techniques have been widely applied to the FER task by entrants in previous years, this paper has two main contributions: (i) to apply the latest `state-of-the-art' visual and temporal networks and (ii) exploring various methods of fusing features extracted from the visual and audio elements to enrich the information available to the final model making the prediction. There are a number of complex issues that arise when trying to classify emotions for `in-the-wild' video sequences, which the above two approaches attempt to directly address. There are some positive findings when comparing the results of this paper to past submissions, indicating that further research into the proposed methods and fine-tuning of the models deployed, could result in another step forwards in the field of automated FER.

Access Paper or Ask Questions

Neural Network Facial Authentication for Public Electric Vehicle Charging Station

Jun 19, 2021
Muhamad Amin Husni Abdul Haris, Sin Liang Lim

This study is to investigate and compare the facial recognition accuracy performance of Dlib ResNet against a K-Nearest Neighbour (KNN) classifier. Particularly when used against a dataset from an Asian ethnicity as Dlib ResNet was reported to have an accuracy deficiency when it comes to Asian faces. The comparisons are both implemented on the facial vectors extracted using the Histogram of Oriented Gradients (HOG) method and use the same dataset for a fair comparison. Authentication of a user by facial recognition in an electric vehicle (EV) charging station demonstrates a practical use case for such an authentication system.

* JETAP Vol.3 No.1 (2021) 17-21 
Access Paper or Ask Questions

Face Trees for Expression Recognition

Dec 05, 2021
Mojtaba Kolahdouzi, Alireza Sepas-Moghaddam, Ali Etemad

We propose an end-to-end architecture for facial expression recognition. Our model learns an optimal tree topology for facial landmarks, whose traversal generates a sequence from which we obtain an embedding to feed a sequential learner. The proposed architecture incorporates two main streams, one focusing on landmark positions to learn the structure of the face, while the other focuses on patches around the landmarks to learn texture information. Each stream is followed by an attention mechanism and the outputs are fed to a two-stream fusion component to perform the final classification. We conduct extensive experiments on two large-scale publicly available facial expression datasets, AffectNet and FER2013, to evaluate the efficacy of our approach. Our method outperforms other solutions in the area and sets new state-of-the-art expression recognition rates on these datasets.

Access Paper or Ask Questions

Omni-supervised Facial Expression Recognition: A Simple Baseline

May 18, 2020
Ping Liu, Yunchao Wei, Zibo Meng, Weihong Deng, Joey Tianyi Zhou, Yi Yang

In this paper, we target on advancing the performance in facial expression recognition (FER) by exploiting omni-supervised learning. The current state of the art FER approaches usually aim to recognize facial expressions in a controlled environment by training models with a limited number of samples. To enhance the robustness of the learned models for various scenarios, we propose to perform omni-supervised learning by exploiting the labeled samples together with a large number of unlabeled data. Particularly, we first employ MS-Celeb-1M as the facial-pool where around 5,822K unlabeled facial images are included. Then, a primitive model learned on a small number of labeled samples is adopted to select samples with high confidence from the facial-pool by conducting feature-based similarity comparison. We find the new dataset constructed in such an omni-supervised manner can significantly improve the generalization ability of the learned FER model and boost the performance consequently. However, as more training samples are used, more computation resources and training time are required, which is usually not affordable in many circumstances. To relieve the requirement of computational resources, we further adopt a dataset distillation strategy to distill the target task-related knowledge from the new mined samples and compressed them into a very small set of images. This distilled dataset is capable of boosting the performance of FER with few additional computational cost introduced. We perform extensive experiments on five popular benchmarks and a newly constructed dataset, where consistent gains can be achieved under various settings using the proposed framework. We hope this work will serve as a solid baseline and help ease future research in FER.

Access Paper or Ask Questions

Variable-state Latent Conditional Random Fields for Facial Expression Recognition and Action Unit Detection

Oct 13, 2015
Robert Walecki, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic

Automated recognition of facial expressions of emotions, and detection of facial action units (AUs), from videos depends critically on modeling of their dynamics. These dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion expressions and AUs, the appearance of which may vary considerably among target subjects, making the recognition/detection task very challenging. The state-of-the-art Latent Conditional Random Fields (L-CRF) framework allows one to efficiently encode these dynamics through the latent states accounting for the temporal consistency in emotion expression and ordinal relationships between its intensity levels, these latent states are typically assumed to be either unordered (nominal) or fully ordered (ordinal). Yet, such an approach is often too restrictive. For instance, in the case of AU detection, the goal is to discriminate between the segments of an image sequence in which this AU is active or inactive. While the sequence segments containing activation of the target AU may better be described using ordinal latent states, the inactive segments better be described using unordered (nominal) latent states, as no assumption can be made about their underlying structure (since they can contain either neutral faces or activations of non-target AUs). To address this, we propose the variable-state L-CRF (VSL-CRF) model that automatically selects the optimal latent states for the target image sequence. To reduce the model overfitting either the nominal or ordinal latent states, we propose a novel graph-Laplacian regularization of the latent states. Our experiments on three public expression databases show that the proposed model achieves better generalization performance compared to traditional L-CRFs and other related state-of-the-art models.

Access Paper or Ask Questions

Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

Apr 22, 2019
Guanbin Li, Xin Zhu, Yirui Zeng, Qing Wang, Liang Lin

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.

* Accepted by AAAI2019 as oral presentation 
Access Paper or Ask Questions

Spontaneous Emotion Recognition from Facial Thermal Images

Dec 13, 2020
Chirag Kyal

One of the key research areas in computer vision addressed by a vast number of publications is the processing and understanding of images containing human faces. The most often addressed tasks include face detection, facial landmark localization, face recognition and facial expression analysis. Other, more specialized tasks such as affective computing, the extraction of vital signs from videos or analysis of social interaction usually require one or several of the aforementioned tasks that have to be performed. In our work, we analyze that a large number of tasks for facial image processing in thermal infrared images that are currently solved using specialized rule-based methods or not solved at all can be addressed with modern learning-based approaches. We have used USTC-NVIE database for training of a number of machine learning algorithms for facial landmark localization.

Access Paper or Ask Questions