Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel Attention for Facial Action Analysis

Mar 23, 2022
Xiaotian Li, Zhihua Li, Huiyuan Yang, Geran Zhao, Lijun Yin

Visual attention has been extensively studied for learning fine-grained features in both facial expression recognition (FER) and Action Unit (AU) detection. A broad range of previous research has explored how to use attention modules to localize detailed facial parts (e,g. facial action units), learn discriminative features, and learn inter-class correlation. However, few related works pay attention to the robustness of the attention module itself. Through experiments, we found neural attention maps initialized with different feature maps yield diverse representations when learning to attend the identical Region of Interest (ROI). In other words, similar to general feature learning, the representational quality of attention maps also greatly affects the performance of a model, which means unconstrained attention learning has lots of randomnesses. This uncertainty lets conventional attention learning fall into sub-optimal. In this paper, we propose a compact model to enhance the representational and focusing power of neural attention maps and learn the "inter-attention" correlation for refined attention maps, which we term the "Self-Diversified Multi-Channel Attention Network (SMA-Net)". The proposed method is evaluated on two benchmark databases (BP4D and DISFA) for AU detection and four databases (CK+, MMI, BU-3DFE, and BP4D+) for facial expression recognition. It achieves superior performance compared to the state-of-the-art methods.

* FG2021(long Oral) 

Towards On-Device Face Recognition in Body-worn Cameras

Apr 07, 2021
Ali Almadan, Ajita Rattani

Face recognition technology related to recognizing identities is widely adopted in intelligence gathering, law enforcement, surveillance, and consumer applications. Recently, this technology has been ported to smartphones and body-worn cameras (BWC). Face recognition technology in body-worn cameras is used for surveillance, situational awareness, and keeping the officer safe. Only a handful of academic studies exist in face recognition using the body-worn camera. A recent study has assembled BWCFace facial image dataset acquired using a body-worn camera and evaluated the ResNet-50 model for face identification. However, for real-time inference in resource constraint body-worn cameras and privacy concerns involving facial images, on-device face recognition is required. To this end, this study evaluates lightweight MobileNet-V2, EfficientNet-B0, LightCNN-9 and LightCNN-29 models for face identification using body-worn camera. Experiments are performed on a publicly available BWCface dataset. The real-time inference is evaluated on three mobile devices. The comparative analysis is done with heavy-weight VGG-16 and ResNet-50 models along with six hand-crafted features to evaluate the trade-off between the performance and model size. Experimental results suggest the difference in maximum rank-1 accuracy of lightweight LightCNN-29 over best-performing ResNet-50 is \textbf{1.85\%} and the reduction in model parameters is \textbf{23.49M}. Most of the deep models obtained similar performances at rank-5 and rank-10. The inference time of LightCNNs is 2.1x faster than other models on mobile devices. The least performance difference of \textbf{14\%} is noted between LightCNN-29 and Local Phase Quantization (LPQ) descriptor at rank-1. In most of the experimental settings, lightweight LightCNN models offered the best trade-off between accuracy and the model size in comparison to most of the models.

* IEEE International Workshop on Biometrics and Forensics (IWBF) 2021 
* 6 pages 

High Performance Human Face Recognition using Independent High Intensity Gabor Wavelet Responses: A Statistical Approach

Jun 17, 2011
Arindam Kar, Debotosh Bhattacharjee, Dipak Kumar Basu, Mita Nasipuri, Mahantapas Kundu

In this paper, we present a technique by which high-intensity feature vectors extracted from the Gabor wavelet transformation of frontal face images, is combined together with Independent Component Analysis (ICA) for enhanced face recognition. Firstly, the high-intensity feature vectors are automatically extracted using the local characteristics of each individual face from the Gabor transformed images. Then ICA is applied on these locally extracted high-intensity feature vectors of the facial images to obtain the independent high intensity feature (IHIF) vectors. These IHIF forms the basis of the work. Finally, the image classification is done using these IHIF vectors, which are considered as representatives of the images. The importance behind implementing ICA along with the high-intensity features of Gabor wavelet transformation is twofold. On the one hand, selecting peaks of the Gabor transformed face images exhibit strong characteristics of spatial locality, scale, and orientation selectivity. Thus these images produce salient local features that are most suitable for face recognition. On the other hand, as the ICA employs locally salient features from the high informative facial parts, it reduces redundancy and represents independent features explicitly. These independent features are most useful for subsequent facial discrimination and associative recall. The efficiency of IHIF method is demonstrated by the experiment on frontal facial images dataset, selected from the FERET, FRAV2D, and the ORL database.

* International Journal of Computer Science & Emerging Technologies pp 178-187, Volume 2, Issue 1, February 2011 
* Keywords: Feature extraction; Gabor Wavelets; independent high-intensity feature (IHIF); Independent Component Analysis (ICA); Specificity; Sensitivity; Cosine Similarity Measure; E-ISSN: 2044-6004 

CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

Jun 17, 2016
Chenchen Zhu, Yutong Zheng, Khoa Luu, Marios Savvides

Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods.


Dynamic Multi-Task Learning for Face Recognition with Facial Expression

Nov 08, 2019
Zuheng Ming, Junshi Xia, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Kaixing Zhao

Benefiting from the joint learning of the multiple tasks in the deep multi-task networks, many applications have shown the promising performance comparing to single-task learning. However, the performance of multi-task learning framework is highly dependant on the relative weights of the tasks. How to assign the weight of each task is a critical issue in the multi-task learning. Instead of tuning the weights manually which is exhausted and time-consuming, in this paper we propose an approach which can dynamically adapt the weights of the tasks according to the difficulty for training the task. Specifically, the proposed method does not introduce the hyperparameters and the simple structure allows the other multi-task deep learning networks can easily realize or reproduce this method. We demonstrate our approach for face recognition with facial expression and facial expression recognition from a single input image based on a deep multi-task learning Conventional Neural Networks (CNNs). Both the theoretical analysis and the experimental results demonstrate the effectiveness of the proposed dynamic multi-task learning method. This multi-task learning with dynamic weights also boosts of the performance on the different tasks comparing to the state-of-art methods with single-task learning.

* accept by the ICCV2019 workshop 

A novel classification-selection approach for the self updating of template-based face recognition systems

Nov 28, 2019
Giulia Orrù, Gian Luca Marcialis, Fabio Roli

The boosting on the need of security notably increased the amount of possible facial recognition applications, especially due to the success of the Internet of Things (IoT) paradigm. However, although handcrafted and deep learning-inspired facial features reached a significant level of compactness and expressive power, the facial recognition performance still suffers from intra-class variations such as ageing, facial expressions, lighting changes, and pose. These variations cannot be captured in a single acquisition and require multiple acquisitions of long duration, which are expensive and need a high level of collaboration from the users. Among others, self-update algorithms have been proposed in order to mitigate these problems. Self-updating aims to add novel templates to the users' gallery among the inputs submitted during system operations. Consequently, computational complexity and storage space tend to be among the critical requirements of these algorithms. The present paper deals with the above problems by a novel template-based self-update algorithm, able to keep over time the expressive power of a limited set of templates stored in the system database. The rationale behind the proposed approach is in the working hypothesis that a dominating mode characterises the features' distribution given the client. Therefore, the key point is to select the best templates around that mode. We propose two methods, which are tested on systems based on handcrafted features and deep-learning-inspired autoencoders at the state-of-the-art. Three benchmark data sets are used. Experimental results confirm that, by effective and compact feature sets which can support our working hypothesis, the proposed classification-selection approaches overcome the problem of manual updating and, in case, stringent computational requirements.

* This is an original manuscript of an article published by Elsevier in Pattern Recognition on 27 November 2019. Available online: 

Memory Integrity of CNNs for Cross-Dataset Facial Expression Recognition

May 28, 2019
Dylan C. Tannugi, Alceu S. Britto Jr., Alessandro L. Koerich

Facial expression recognition is a major problem in the domain of artificial intelligence. One of the best ways to solve this problem is the use of convolutional neural networks (CNNs). However, a large amount of data is required to train properly these networks but most of the datasets available for facial expression recognition are relatively small. A common way to circumvent the lack of data is to use CNNs trained on large datasets of different domains and fine-tuning the layers of such networks to the target domain. However, the fine-tuning process does not preserve the memory integrity as CNNs have the tendency to forget patterns they have learned. In this paper, we evaluate different strategies of fine-tuning a CNN with the aim of assessing the memory integrity of such strategies in a cross-dataset scenario. A CNN pre-trained on a source dataset is used as the baseline and four adaptation strategies have been evaluated: fine-tuning its fully connected layers; fine-tuning its last convolutional layer and its fully connected layers; retraining the CNN on a target dataset; and the fusion of the source and target datasets and retraining the CNN. Experimental results on four datasets have shown that the fusion of the source and the target datasets provides the best trade-off between accuracy and memory integrity.


Automatic facial feature extraction and expression recognition based on neural network

Apr 10, 2012
S. P. Khandait, R. C. Thool, P. D. Khandait

In this paper, an approach to the problem of automatic facial feature extraction from a still frontal posed image and classification and recognition of facial expression and hence emotion and mood of a person is presented. Feed forward back propagation neural network is used as a classifier for classifying the expressions of supplied face into seven basic categories like surprise, neutral, sad, disgust, fear, happy and angry. For face portion segmentation and localization, morphological image processing operations are used. Permanent facial features like eyebrows, eyes, mouth and nose are extracted using SUSAN edge detection operator, facial geometry, edge projection analysis. Experiments are carried out on JAFFE facial expression database and gives better performance in terms of 100% accuracy for training set and 95.26% accuracy for test set.

* 6 pages,pp. 113-118, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No.1, January 2011 

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Apr 21, 2021
Delian Ruan, Yan Yan, Shenqi Lai, Zhenhua Chai, Chunhua Shen, Hanzi Wang

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition. We view the expression information as the combination of the shared information (expression similarities) across different expressions and the unique information (expression-specific variations) for each expression. More specifically, FDRL mainly consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN). In particular, FDN first decomposes the basic features extracted from a backbone network into a set of facial action-aware latent features to model expression similarities. Then, FRN captures the intra-feature and inter-feature relationships for latent features to characterize expression-specific variations, and reconstructs the expression feature. To this end, two modules including an intra-feature relation modeling module and an inter-feature relation modeling module are developed in FRN. Experimental results on both the in-the-lab databases (including CK+, MMI, and Oulu-CASIA) and the in-the-wild databases (including RAF-DB and SFEW) show that the proposed FDRL method consistently achieves higher recognition accuracy than several state-of-the-art methods. This clearly highlights the benefit of feature decomposition and reconstruction for classifying expressions.

* accepted to CVPR 2021