Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression

May 22, 2016
David Rim, Sina Honari, Md Kamrul Hasan, Chris Pal

We present techniques for improving performance driven facial animation, emotion recognition, and facial key-point or landmark prediction using learned identity invariant representations. Established approaches to these problems can work well if sufficient examples and labels for a particular identity are available and factors of variation are highly controlled. However, labeled examples of facial expressions, emotions and key-points for new individuals are difficult and costly to obtain. In this paper we improve the ability of techniques to generalize to new and unseen individuals by explicitly modeling previously seen variations related to identity and expression. We use a weakly-supervised approach in which identity labels are used to learn the different factors of variation linked to identity separately from factors related to expression. We show how probabilistic modeling of these sources of variation allows one to learn identity-invariant representations for expressions which can then be used to identity-normalize various procedures for facial expression analysis and animation control. We also show how to extend the widely used techniques of active appearance models and constrained local models through replacing the underlying point distribution models which are typically constructed using principal component analysis with identity-expression factorized representations. We present a wide variety of experiments in which we consistently improve performance on emotion recognition, markerless performance-driven facial animation and facial key-point tracking.

* to appear in Image and Vision Computing Journal (IMAVIS) 

Deep learning for identification and face, gender, expression recognition under constraints

Nov 02, 2021
Ahmad B. Hassanat, Abeer Albustanji, Ahmad S. Tarawneh, Malek Alrashidi, Hani Alharbi, Mohammed Alanazi, Mansoor Alghamdi, Ibrahim S Alkhazi, V. B. Surya Prasath

Biometric recognition based on the full face is an extensive research area. However, using only partially visible faces, such as in the case of veiled-persons, is a challenging task. Deep convolutional neural network (CNN) is used in this work to extract the features from veiled-person face images. We found that the sixth and the seventh fully connected layers, FC6 and FC7 respectively, in the structure of the VGG19 network provide robust features with each of these two layers containing 4096 features. The main objective of this work is to test the ability of deep learning based automated computer system to identify not only persons, but also to perform recognition of gender, age, and facial expressions such as eye smile. Our experimental results indicate that we obtain high accuracy for all the tasks. The best recorded accuracy values are up to 99.95% for identifying persons, 99.9% for gender recognition, 99.9% for age recognition and 80.9% for facial expression (eye smile) recognition.

* Submitted to International Journal of Biometrics 

Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation

Nov 25, 2018
Shiming Ge, Shengwei Zhao, Chenyu Li, Jia Li

Typically, the deployment of face recognition models in the wild needs to identify low-resolution faces with extremely low computational cost. To address this problem, a feasible solution is compressing a complex face model to achieve higher speed and lower memory at the cost of minimal performance drop. Inspired by that, this paper proposes a learning approach to recognize low-resolution faces via selective knowledge distillation. In this approach, a two-stream convolutional neural network (CNN) is first initialized to recognize high-resolution faces and resolution-degraded faces with a teacher stream and a student stream, respectively. The teacher stream is represented by a complex CNN for high-accuracy recognition, and the student stream is represented by a much simpler CNN for low-complexity recognition. To avoid significant performance drop at the student stream, we then selectively distil the most informative facial features from the teacher stream by solving a sparse graph optimization problem, which are then used to regularize the fine-tuning process of the student stream. In this way, the student stream is actually trained by simultaneously handling two tasks with limited computational resources: approximating the most informative facial cues via feature regression, and recovering the missing facial cues via low-resolution face classification. Experimental results show that the student stream performs impressively in recognizing low-resolution faces and costs only 0.15MB memory and runs at 418 faces per second on CPU and 9,433 faces per second on GPU.


BWCFace: Open-set Face Recognition using Body-worn Camera

Sep 24, 2020
Ali Almadan, Anoop Krishnan, Ajita Rattani

With computer vision reaching an inflection point in the past decade, face recognition technology has become pervasive in policing, intelligence gathering, and consumer applications. Recently, face recognition technology has been deployed on bodyworn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic using traditional techniques on datasets with small sample size. This paper aims to bridge the gap in the state-of-the-art face recognition using bodyworn cameras (BWC). To this aim, the contribution of this work is two-fold: (1) collection of a dataset called BWCFace consisting of a total of 178K facial images of 132 subjects captured using the body-worn camera in in-door and daylight conditions, and (2) open-set evaluation of the latest deep-learning-based Convolutional Neural Network (CNN) architectures combined with five different loss functions for face identification, on the collected dataset. Experimental results on our BWCFace dataset suggest a maximum of 33.89% Rank-1 accuracy obtained when facial features are extracted using SENet-50 trained on a large scale VGGFace2 facial image dataset. However, performance improved up to a maximum of 99.00% Rank-1 accuracy when pretrained CNN models are fine-tuned on a subset of identities in our BWCFace dataset. Equivalent performances were obtained across body-worn camera sensor models used in existing face datasets. The collected BWCFace dataset and the pretrained/ fine-tuned algorithms are publicly available to promote further research and development in this area. A downloadable link of this dataset and the algorithms is available by contacting the authors.

* 19th IEEE International Conference On Machine Learning And Applications 2020 | Miami, Florida 

Thermal to Visible Face Recognition Using Deep Autoencoders

Feb 10, 2020
Alperen Kantarcı, Hazım Kemal Ekenel

Visible face recognition systems achieve nearly perfect recognition accuracies using deep learning. However, in lack of light, these systems perform poorly. A way to deal with this problem is thermal to visible cross-domain face matching. This is a desired technology because of its usefulness in night time surveillance. Nevertheless, due to differences between two domains, it is a very challenging face recognition problem. In this paper, we present a deep autoencoder based system to learn the mapping between visible and thermal face images. Also, we assess the impact of alignment in thermal to visible face recognition. For this purpose, we manually annotate the facial landmarks on the Carl and EURECOM datasets. The proposed approach is extensively tested on three publicly available datasets: Carl, UND-X1, and EURECOM. Experimental results show that the proposed approach improves the state-of-the-art significantly. We observe that alignment increases the performance by around 2%. Annotated facial landmark positions in this study can be downloaded from the following link: .

* 5 pages, 3 figures, 2019 International Conference of the Biometrics Special Interest Group (BIOSIG) 

Disentanglement for Discriminative Visual Recognition

Jun 14, 2020
Xiaofeng Liu

Recent successes of deep learning-based recognition rely on maintaining the content related to the main-task label. However, how to explicitly dispel the noisy signals for better generalization in a controllable manner remains an open issue. For instance, various factors such as identity-specific attributes, pose, illumination and expression affect the appearance of face images. Disentangling the identity-specific factors is potentially beneficial for facial expression recognition (FER). This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation. In this chapter, these problems are casted as either a deep metric learning problem or an adversarial minimax game in the latent space. For the former choice, a generalized adaptive (N+M)-tuplet clusters loss function together with the identity-aware hard-negative mining and online positive mining scheme can be used for identity-invariant FER. The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization. For the latter solution, it is possible to equipping an end-to-end conditional adversarial network with the ability to decompose an input sample into three complementary parts. The discriminative representation inherits the desired invariance property guided by prior knowledge of the task, which is marginal independent to the task-relevant/irrelevant semantic and latent variations. The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition. This chapter systematically summarize the popular and practical solution for disentanglement to achieve more discriminative visual recognition.

* Manuscript for book "Recognition and perception of images" Willy 

Thermal Human face recognition based on Haar wavelet transform and series matching technique

Sep 04, 2013
Ayan Seal, Suranjan Ganguly, Debotosh Bhattacharjee, Mita Nasipuri, Dipak kr. Basu

Thermal infrared (IR) images represent the heat patterns emitted from hot object and they do not consider the energies reflected from an object. Objects living or non-living emit different amounts of IR energy according to their body temperature and characteristics. Humans are homoeothermic and hence capable of maintaining constant temperature under different surrounding temperature. Face recognition from thermal (IR) images should focus on changes of temperature on facial blood vessels. These temperature changes can be regarded as texture features of images and wavelet transform is a very good tool to analyze multi-scale and multi-directional texture. Wavelet transform is also used for image dimensionality reduction, by removing redundancies and preserving original features of the image. The sizes of the facial images are normally large. So, the wavelet transform is used before image similarity is measured. Therefore this paper describes an efficient approach of human face recognition based on wavelet transform from thermal IR images. The system consists of three steps. At the very first step, human thermal IR face image is preprocessed and the face region is only cropped from the entire image. Secondly, Haar wavelet is used to extract low frequency band from the cropped face region. Lastly, the image classification between the training images and the test images is done, which is based on low-frequency components. The proposed approach is tested on a number of human thermal infrared face images created at our own laboratory and Terravic Facial IR Database. Experimental results indicated that the thermal infra red face images can be recognized by the proposed system effectively. The maximum success of 95% recognition has been achieved.

* 12 pages. arXiv admin note: substantial text overlap with arXiv:1309.1009 

Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning

May 29, 2022
Yanxin Song, Jianzong Wang, Tianbo Wu, Zhangcheng Huang, Jing Xiao

Facial micro-expressions recognition has attracted much attention recently. Micro-expressions have the characteristics of short duration and low intensity, and it is difficult to train a high-performance classifier with the limited number of existing micro-expressions. Therefore, recognizing micro-expressions is a challenge task. In this paper, we propose a micro-expression recognition method based on attribute information embedding and cross-modal contrastive learning. We use 3D CNN to extract RGB features and FLOW features of micro-expression sequences and fuse them, and use BERT network to extract text information in Facial Action Coding System. Through cross-modal contrastive loss, we embed attribute information in the visual network, thereby improving the representation ability of micro-expression recognition in the case of limited samples. We conduct extensive experiments in CASME II and MMEW databases, and the accuracy is 77.82% and 71.04%, respectively. The comparative experiments show that this method has better recognition effect than other methods for micro-expression recognition.

* This paper has been accepted by IJCNN2022 

Occlusion-guided compact template learning for ensemble deep network-based pose-invariant face recognition

Apr 15, 2019
Yuhang Wu, Ioannis A. Kakadiaris

Concatenation of the deep network representations extracted from different facial patches helps to improve face recognition performance. However, the concatenated facial template increases in size and contains redundant information. Previous solutions aim to reduce the dimensionality of the facial template without considering the occlusion pattern of the facial patches. In this paper, we propose an occlusion-guided compact template learning (OGCTL) approach that only uses the information from visible patches to construct the compact template. The compact face representation is not sensitive to the number of patches that are used to construct the facial template and is more suitable for incorporating the information from different view angles for image-set based face recognition. Instead of using occlusion masks in face matching (e.g., DPRFS [38]), the proposed method uses occlusion masks in template construction and achieves significantly better image-set based face verification performance on a challenging database with a template size that is an order-of-magnitude smaller than DPRFS.

* Accepted by International Conference on Biometrics (ICB 2019) as an Oral presentation 

Independent Sign Language Recognition with 3D Body, Hands, and Face Reconstruction

Nov 24, 2020
Agelos Kratimenos, Georgios Pavlakos, Petros Maragos

Independent Sign Language Recognition is a complex visual recognition problem that combines several challenging tasks of Computer Vision due to the necessity to exploit and fuse information from hand gestures, body features and facial expressions. While many state-of-the-art works have managed to deeply elaborate on these features independently, to the best of our knowledge, no work has adequately combined all three information channels to efficiently recognize Sign Language. In this work, we employ SMPL-X, a contemporary parametric model that enables joint extraction of 3D body shape, face and hands information from a single image. We use this holistic 3D reconstruction for SLR, demonstrating that it leads to higher accuracy than recognition from raw RGB images and their optical flow fed into the state-of-the-art I3D-type network for 3D action recognition and from 2D Openpose skeletons fed into a Recurrent Neural Network. Finally, a set of experiments on the body, face and hand features showed that neglecting any of these, significantly reduces the classification accuracy, proving the importance of jointly modeling body shape, facial expression and hand pose for Sign Language Recognition.

* Submitted to ICASSP 2021