Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Fusing Body Posture with Facial Expressions for Joint Recognition of Affect in Child-Robot Interaction

Jan 07, 2019
Panagiotis P. Filntisis, Niki Efthymiou, Petros Koutras, Gerasimos Potamianos, Petros Maragos

In this paper we address the problem of multi-cue affect recognition in challenging environments such as child-robot interaction. Towards this goal we propose a method for automatic recognition of affect that leverages body expressions alongside facial expressions, as opposed to traditional methods that usually focus only on the latter. We evaluate our methods on a challenging child-robot interaction database of emotional expressions, as well as on a database of emotional expressions by actors, and show that the proposed method achieves significantly better results when compared with the facial expression baselines, can be trained both jointly and separately, and offers us computational models for both the individual modalities, as well as for the whole body emotion.

Access Paper or Ask Questions

Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation

Jul 04, 2021
Rauf Momin, Ali Shan Momin, Khalid Rasheed, Muhammad Saqib

Facial expressions are the most common universal forms of body language. In the past few years, automatic facial expression recognition (FER) has been an active field of research. However, it is still a challenging task due to different uncertainties and complications. Nevertheless, efficiency and performance are yet essential aspects for building robust systems. In this work, we propose two models named EmoXNet and EmoXNetLite. EmoXNet is an ensemble learning technique for learning convoluted facial representations, whereas EmoXNetLite is a distillation technique for transferring the knowledge from our ensemble model to an efficient deep neural network using label-smoothen soft labels to detect expressions effectively in real-time. Both models attained better accuracy level in comparison to the models reported to date. The ensemble model (EmoXNet) attained 85.07% test accuracy on FER-2013 with FER+ annotations and 86.25% test accuracy on Real-world Affective Faces Database (RAF-DB). Whereas, the distilled model (EmoXNetLite) attained 82.07% test accuracy on FER-2013 with FER+ annotations and 81.78% test accuracy on RAF-DB. Results show that our models seem to generalize well on new data and are learned to focus on relevant facial representations for expressions recognition.

* 5 pages, 3 figures, 4 tables 
Access Paper or Ask Questions

Finding your Lookalike: Measuring Face Similarity Rather than Face Identity

Jun 13, 2018
Amir Sadovnik, Wassim Gharbi, Thanh Vu, Andrew Gallagher

Face images are one of the main areas of focus for computer vision, receiving on a wide variety of tasks. Although face recognition is probably the most widely researched, many other tasks such as kinship detection, facial expression classification and facial aging have been examined. In this work we propose the new, subjective task of quantifying perceived face similarity between a pair of faces. That is, we predict the perceived similarity between facial images, given that they are not of the same person. Although this task is clearly correlated with face recognition, it is different and therefore justifies a separate investigation. Humans often remark that two persons look alike, even in cases where the persons are not actually confused with one another. In addition, because face similarity is different than traditional image similarity, there are challenges in data collection and labeling, and dealing with diverging subjective opinions between human labelers. We present evidence that finding facial look-alikes and recognizing faces are two distinct tasks. We propose a new dataset for facial similarity and introduce the Lookalike network, directed towards similar face classification, which outperforms the ad hoc usage of a face recognition network directed at the same task.

* Accepted to the 1st CVPR Workshop on Visual Understanding of Subjective Attributes of Data 2018 
Access Paper or Ask Questions

EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition

Apr 14, 2019
Monu Verma, Jaspreet Kaur Bhui, Santosh Vipparthi, Girdhari Singh

Facial expressions have essential cues to infer the humans state of mind, that conveys adequate information to understand individuals actual feelings. Thus, automatic facial expression recognition is an interesting and crucial task to interpret the humans cognitive state through the machine. In this paper, we proposed an Exigent Features Preservative Network (EXPERTNet), to describe the features of the facial expressions. The EXPERTNet extracts only pertinent features and neglect others by using exigent feature (ExFeat) block, mainly comprises of elective layer. Specifically, elective layer selects the desired edge variation features from the previous layer outcomes, which are generated by applying different sized filters as 1 x 1, 3 x 3, 5 x 5 and 7 x 7. Different sized filters aid to elicits both micro and high-level features that enhance the learnability of neurons. ExFeat block preserves the spatial structural information of the facial expression, which allows to discriminate between different classes of facial expressions. Visual representation of the proposed method over different facial expressions shows the learning capability of the neurons of different layers. Experimental and comparative analysis results over four comprehensive datasets CK+, MMI DISFA and GEMEP-FERA, ensures the better performance of the proposed network as compared to existing networks.

Access Paper or Ask Questions

Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel Attention for Facial Action Analysis

Mar 23, 2022
Xiaotian Li, Zhihua Li, Huiyuan Yang, Geran Zhao, Lijun Yin

Visual attention has been extensively studied for learning fine-grained features in both facial expression recognition (FER) and Action Unit (AU) detection. A broad range of previous research has explored how to use attention modules to localize detailed facial parts (e,g. facial action units), learn discriminative features, and learn inter-class correlation. However, few related works pay attention to the robustness of the attention module itself. Through experiments, we found neural attention maps initialized with different feature maps yield diverse representations when learning to attend the identical Region of Interest (ROI). In other words, similar to general feature learning, the representational quality of attention maps also greatly affects the performance of a model, which means unconstrained attention learning has lots of randomnesses. This uncertainty lets conventional attention learning fall into sub-optimal. In this paper, we propose a compact model to enhance the representational and focusing power of neural attention maps and learn the "inter-attention" correlation for refined attention maps, which we term the "Self-Diversified Multi-Channel Attention Network (SMA-Net)". The proposed method is evaluated on two benchmark databases (BP4D and DISFA) for AU detection and four databases (CK+, MMI, BU-3DFE, and BP4D+) for facial expression recognition. It achieves superior performance compared to the state-of-the-art methods.

* FG2021(long Oral) 
Access Paper or Ask Questions

Towards On-Device Face Recognition in Body-worn Cameras

Apr 07, 2021
Ali Almadan, Ajita Rattani

Face recognition technology related to recognizing identities is widely adopted in intelligence gathering, law enforcement, surveillance, and consumer applications. Recently, this technology has been ported to smartphones and body-worn cameras (BWC). Face recognition technology in body-worn cameras is used for surveillance, situational awareness, and keeping the officer safe. Only a handful of academic studies exist in face recognition using the body-worn camera. A recent study has assembled BWCFace facial image dataset acquired using a body-worn camera and evaluated the ResNet-50 model for face identification. However, for real-time inference in resource constraint body-worn cameras and privacy concerns involving facial images, on-device face recognition is required. To this end, this study evaluates lightweight MobileNet-V2, EfficientNet-B0, LightCNN-9 and LightCNN-29 models for face identification using body-worn camera. Experiments are performed on a publicly available BWCface dataset. The real-time inference is evaluated on three mobile devices. The comparative analysis is done with heavy-weight VGG-16 and ResNet-50 models along with six hand-crafted features to evaluate the trade-off between the performance and model size. Experimental results suggest the difference in maximum rank-1 accuracy of lightweight LightCNN-29 over best-performing ResNet-50 is \textbf{1.85\%} and the reduction in model parameters is \textbf{23.49M}. Most of the deep models obtained similar performances at rank-5 and rank-10. The inference time of LightCNNs is 2.1x faster than other models on mobile devices. The least performance difference of \textbf{14\%} is noted between LightCNN-29 and Local Phase Quantization (LPQ) descriptor at rank-1. In most of the experimental settings, lightweight LightCNN models offered the best trade-off between accuracy and the model size in comparison to most of the models.

* IEEE International Workshop on Biometrics and Forensics (IWBF) 2021 
* 6 pages 
Access Paper or Ask Questions

High Performance Human Face Recognition using Independent High Intensity Gabor Wavelet Responses: A Statistical Approach

Jun 17, 2011
Arindam Kar, Debotosh Bhattacharjee, Dipak Kumar Basu, Mita Nasipuri, Mahantapas Kundu

In this paper, we present a technique by which high-intensity feature vectors extracted from the Gabor wavelet transformation of frontal face images, is combined together with Independent Component Analysis (ICA) for enhanced face recognition. Firstly, the high-intensity feature vectors are automatically extracted using the local characteristics of each individual face from the Gabor transformed images. Then ICA is applied on these locally extracted high-intensity feature vectors of the facial images to obtain the independent high intensity feature (IHIF) vectors. These IHIF forms the basis of the work. Finally, the image classification is done using these IHIF vectors, which are considered as representatives of the images. The importance behind implementing ICA along with the high-intensity features of Gabor wavelet transformation is twofold. On the one hand, selecting peaks of the Gabor transformed face images exhibit strong characteristics of spatial locality, scale, and orientation selectivity. Thus these images produce salient local features that are most suitable for face recognition. On the other hand, as the ICA employs locally salient features from the high informative facial parts, it reduces redundancy and represents independent features explicitly. These independent features are most useful for subsequent facial discrimination and associative recall. The efficiency of IHIF method is demonstrated by the experiment on frontal facial images dataset, selected from the FERET, FRAV2D, and the ORL database.

* International Journal of Computer Science & Emerging Technologies pp 178-187, Volume 2, Issue 1, February 2011 
* Keywords: Feature extraction; Gabor Wavelets; independent high-intensity feature (IHIF); Independent Component Analysis (ICA); Specificity; Sensitivity; Cosine Similarity Measure; E-ISSN: 2044-6004 
Access Paper or Ask Questions

CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

Jun 17, 2016
Chenchen Zhu, Yutong Zheng, Khoa Luu, Marios Savvides

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.

Access Paper or Ask Questions

Dynamic Multi-Task Learning for Face Recognition with Facial Expression

Nov 08, 2019
Zuheng Ming, Junshi Xia, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Kaixing Zhao

Benefiting from the joint learning of the multiple tasks in the deep multi-task networks, many applications have shown the promising performance comparing to single-task learning. However, the performance of multi-task learning framework is highly dependant on the relative weights of the tasks. How to assign the weight of each task is a critical issue in the multi-task learning. Instead of tuning the weights manually which is exhausted and time-consuming, in this paper we propose an approach which can dynamically adapt the weights of the tasks according to the difficulty for training the task. Specifically, the proposed method does not introduce the hyperparameters and the simple structure allows the other multi-task deep learning networks can easily realize or reproduce this method. We demonstrate our approach for face recognition with facial expression and facial expression recognition from a single input image based on a deep multi-task learning Conventional Neural Networks (CNNs). Both the theoretical analysis and the experimental results demonstrate the effectiveness of the proposed dynamic multi-task learning method. This multi-task learning with dynamic weights also boosts of the performance on the different tasks comparing to the state-of-art methods with single-task learning.

* accept by the ICCV2019 workshop 
Access Paper or Ask Questions

A novel classification-selection approach for the self updating of template-based face recognition systems

Nov 28, 2019
Giulia Orrù, Gian Luca Marcialis, Fabio Roli

The boosting on the need of security notably increased the amount of possible facial recognition applications, especially due to the success of the Internet of Things (IoT) paradigm. However, although handcrafted and deep learning-inspired facial features reached a significant level of compactness and expressive power, the facial recognition performance still suffers from intra-class variations such as ageing, facial expressions, lighting changes, and pose. These variations cannot be captured in a single acquisition and require multiple acquisitions of long duration, which are expensive and need a high level of collaboration from the users. Among others, self-update algorithms have been proposed in order to mitigate these problems. Self-updating aims to add novel templates to the users' gallery among the inputs submitted during system operations. Consequently, computational complexity and storage space tend to be among the critical requirements of these algorithms. The present paper deals with the above problems by a novel template-based self-update algorithm, able to keep over time the expressive power of a limited set of templates stored in the system database. The rationale behind the proposed approach is in the working hypothesis that a dominating mode characterises the features' distribution given the client. Therefore, the key point is to select the best templates around that mode. We propose two methods, which are tested on systems based on handcrafted features and deep-learning-inspired autoencoders at the state-of-the-art. Three benchmark data sets are used. Experimental results confirm that, by effective and compact feature sets which can support our working hypothesis, the proposed classification-selection approaches overcome the problem of manual updating and, in case, stringent computational requirements.

* This is an original manuscript of an article published by Elsevier in Pattern Recognition on 27 November 2019. Available online: 
Access Paper or Ask Questions