Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"facial recognition": models, code, and papers

Exploring Racial Bias within Face Recognition via per-subject Adversarially-Enabled Data Augmentation

Apr 19, 2020
Seyma Yucer, Samet Akçay, Noura Al-Moubayed, Toby P. Breckon

Whilst face recognition applications are becoming increasingly prevalent within our daily lives, leading approaches in the field still suffer from performance bias to the detriment of some racial profiles within society. In this study, we propose a novel adversarial derived data augmentation methodology that aims to enable dataset balance at a per-subject level via the use of image-to-image transformation for the transfer of sensitive racial characteristic facial features. Our aim is to automatically construct a synthesised dataset by transforming facial images across varying racial domains, while still preserving identity-related features, such that racially dependant features subsequently become irrelevant within the determination of subject identity. We construct our experiments on three significant face recognition variants: Softmax, CosFace and ArcFace loss over a common convolutional neural network backbone. In a side-by-side comparison, we show the positive impact our proposed technique can have on the recognition performance for (racial) minority groups within an originally imbalanced training dataset by reducing the pre-race variance in performance.

* CVPR 2020 - Fair, Data Efficient and Trusted Computer Vision Workshop 

Exploring Facial Expressions and Affective Domains for Parkinson Detection

Dec 11, 2020
Luis Felipe Gomez-Gomez, Aythami Morales, Julian Fierrez, Juan Rafael Orozco-Arroyave

Parkinson's Disease (PD) is a neurological disorder that affects facial movements and non-verbal communication. Patients with PD present a reduction in facial movements called hypomimia which is evaluated in item 3.2 of the MDS-UPDRS-III scale. In this work, we propose to use facial expression analysis from face images based on affective domains to improve PD detection. We propose different domain adaptation techniques to exploit the latest advances in face recognition and Face Action Unit (FAU) detection. The principal contributions of this work are: (1) a novel framework to exploit deep face architectures to model hypomimia in PD patients; (2) we experimentally compare PD detection based on single images vs. image sequences while the patients are evoked various face expressions; (3) we explore different domain adaptation techniques to exploit existing models initially trained either for Face Recognition or to detect FAUs for the automatic discrimination between PD patients and healthy subjects; and (4) a new approach to use triplet-loss learning to improve hypomimia modeling and PD detection. The results on real face images from PD patients show that we are able to properly model evoked emotions using image sequences (neutral, onset-transition, apex, offset-transition, and neutral) with accuracy improvements up to 5.5% (from 72.9% to 78.4%) with respect to single-image PD detection. We also show that our proposed affective-domain adaptation provides improvements in PD detection up to 8.9% (from 78.4% to 87.3% detection accuracy).


Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

Nov 09, 2021
Gnana Praveen R, Eric Granger, Patrick Cardinal

Multimodal analysis has recently drawn much interest in affective computing, since it can improve the overall accuracy of emotion recognition over isolated uni-modal approaches. The most effective techniques for multimodal emotion recognition efficiently leverage diverse and complimentary sources of information, such as facial, vocal, and physiological modalities, to provide comprehensive feature representations. In this paper, we focus on dimensional emotion recognition based on the fusion of facial and vocal modalities extracted from videos, where complex spatiotemporal relationships may be captured. Most of the existing fusion techniques rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complimentary nature of audio-visual (A-V) modalities. We introduce a cross-attentional fusion approach to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal. Our new cross-attentional A-V fusion model efficiently leverages the inter-modal relationships. In particular, it computes cross-attention weights to focus on the more contributive features across individual modalities, and thereby combine contributive feature representations, which are then fed to fully connected layers for the prediction of valence and arousal. The effectiveness of the proposed approach is validated experimentally on videos from the RECOLA and Fatigue (private) data-sets. Results indicate that our cross-attentional A-V fusion model is a cost-effective approach that outperforms state-of-the-art fusion approaches. Code is available: \url{}

* Accepted in FG2021 

Relational Deep Feature Learning for Heterogeneous Face Recognition

Mar 02, 2020
MyeongAh Cho, Taeoh Kim, Ig-Jae Kim, Sangyoun Lee

Heterogeneous Face Recognition (HFR) is a task that matches faces across two different domains such as VIS (visible light), NIR (near-infrared), or the sketch domain. In contrast to face recognition in visual spectrum, because of the domain discrepancy, this task requires to extract domain-invariant feature or common space projection learning. To bridge this domain gap, we propose a graph-structured module that focuses on facial relational information to reduce the fundamental differences in domain characteristics. Since relational information is domain independent, our Relational Graph Module (RGM) performs relation modeling from node vectors that represent facial components such as lips, nose, and chin. Propagation of the generated relational graph then reduces the domain difference by transitioning from spatially correlated CNN (convolutional neural network) features to inter-dependent relational features. In addition, we propose a Node Attention Unit (NAU) that performs node-wise recalibration to focus on the more informative nodes arising from the relation-based propagation. Furthermore, we suggest a novel conditional-margin loss function (C-Softmax) for efficient projection learning on the common latent space of the embedding vector. Our module can be plugged into any pre-trained face recognition network to help overcome the limitations of a small HFR database. The proposed method shows superior performance on three different HFR databases CAISA NIR-VIS 2.0, IIIT-D Sketch, and BUAA-VisNir in various pre-trained networks. Furthermore, we explore our C-Softmax loss boosts HFR performance and also apply our loss to the large-scale visual face database LFW(Labeled Faces in Wild) by learning inter-class margins adaptively.


A Novel Space-Time Representation on the Positive Semidefinite Con for Facial Expression Recognition

Jul 20, 2017
Anis Kacem, Mohamed Daoudi, Boulbaba Ben Amor, Juan Carlos Alvarez-Paiva

In this paper, we study the problem of facial expression recognition using a novel space-time geometric representation. We describe the temporal evolution of facial landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the advantage to bring naturally a second desirable quantity when comparing shapes -- the spatial covariance -- in addition to the conventional affine-shape representation. We derive then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the manifold. Specifically, our approach involves three steps: 1) facial landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of rank 2, to build time-parameterized trajectories; 2) a temporal alignment is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; 3) finally, pairwise proximity function SVM (ppfSVM) is used to classify them, incorporating the latter (dis-)similarity measure into the kernel function. We show the effectiveness of the proposed approach on four publicly available benchmarks (CK+, MMI, Oulu-CASIA, and AFEW). The results of the proposed approach are comparable to or better than the state-of-the-art methods when involving only facial landmarks.

* To be appeared at ICCV 2017 

Facial Expression Representation and Recognition Using 2DHLDA, Gabor Wavelets, and Ensemble Learning

Jul 20, 2012
Mahmoud Khademi, Mohammad H. Kiapour, Mehran Safayani, Mohammad T. Manzuri, M. Shojaei

In this paper, a novel method for representation and recognition of the facial expressions in two-dimensional image sequences is presented. We apply a variation of two-dimensional heteroscedastic linear discriminant analysis (2DHLDA) algorithm, as an efficient dimensionality reduction technique, to Gabor representation of the input sequence. 2DHLDA is an extension of the two-dimensional linear discriminant analysis (2DLDA) approach and it removes the equal within-class covariance. By applying 2DHLDA in two directions, we eliminate the correlations between both image columns and image rows. Then, we perform a one-dimensional LDA on the new features. This combined method can alleviate the small sample size problem and instability encountered by HLDA. Also, employing both geometric and appearance features and using an ensemble learning scheme based on data fusion, we create a classifier which can efficiently classify the facial expressions. The proposed method is robust to illumination changes and it can properly represent temporal information as well as subtle changes in facial muscles. We provide experiments on Cohn-Kanade database that show the superiority of the proposed method. KEYWORDS: two-dimensional heteroscedastic linear discriminant analysis (2DHLDA), subspace learning, facial expression analysis, Gabor wavelets, ensemble learning.

* This paper has been withdrawn by the author due to an error in experimental results 

Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Nov 25, 2021
Juan Manuel Mayor-Torres, Sara Medina-DeVilliers, Tessa Clarkson, Matthew D. Lerner, Giuseppe Riccardi

Current models on Explainable Artificial Intelligence (XAI) have shown an evident and quantified lack of reliability for measuring feature-relevance when statistically entangled features are proposed for training deep classifiers. There has been an increase in the application of Deep Learning in clinical trials to predict early diagnosis of neuro-developmental disorders, such as Autism Spectrum Disorder (ASD). However, the inclusion of more reliable saliency-maps to obtain more trustworthy and interpretable metrics using neural activity features is still insufficiently mature for practical applications in diagnostics or clinical trials. Moreover, in ASD research the inclusion of deep classifiers that use neural measures to predict viewed facial emotions is relatively unexplored. Therefore, in this study we propose the evaluation of a Convolutional Neural Network (CNN) for electroencephalography (EEG)-based facial emotion recognition decoding complemented with a novel RemOve-And-Retrain (ROAR) methodology to recover highly relevant features used in the classifier. Specifically, we compare well-known relevance maps such as Layer-Wise Relevance Propagation (LRP), PatternNet, Pattern Attribution, and Smooth-Grad Squared. This study is the first to consolidate a more transparent feature-relevance calculation for a successful EEG-based facial emotion recognition using a within-subject-trained CNN in typically-developed and ASD individuals.


Expression Snippet Transformer for Robust Video-based Facial Expression Recognition

Sep 17, 2021
Yuanyuan Liu, Wenbin Wang, Chuanxu Feng, Haoyu Zhang, Zhe Chen, Yibing Zhan

The recent success of Transformer has provided a new direction to various visual understanding tasks, including video-based facial expression recognition (FER). By modeling visual relations effectively, Transformer has shown its power for describing complicated patterns. However, Transformer still performs unsatisfactorily to notice subtle facial expression movements, because the expression movements of many videos can be too small to extract meaningful spatial-temporal relations and achieve robust performance. To this end, we propose to decompose each video into a series of expression snippets, each of which contains a small number of facial movements, and attempt to augment the Transformer's ability for modeling intra-snippet and inter-snippet visual relations, respectively, obtaining the Expression snippet Transformer (EST). In particular, for intra-snippet modeling, we devise an attention-augmented snippet feature extractor (AA-SFE) to enhance the encoding of subtle facial movements of each snippet by gradually attending to more salient information. In addition, for inter-snippet modeling, we introduce a shuffled snippet order prediction (SSOP) head and a corresponding loss to improve the modeling of subtle motion changes across subsequent snippets by training the Transformer to identify shuffled snippet orders. Extensive experiments on four challenging datasets (i.e., BU-3DFE, MMI, AFEW, and DFEW) demonstrate that our EST is superior to other CNN-based methods, obtaining state-of-the-art performance.


BReG-NeXt: Facial Affect Computing Using Adaptive Residual Networks With Bounded Gradient

Apr 18, 2020
Behzad Hasani, Pooran Singh Negi, Mohammad H. Mahoor

This paper introduces BReG-NeXt, a residual-based network architecture using a function wtih bounded derivative instead of a simple shortcut path (a.k.a. identity mapping) in the residual units for automatic recognition of facial expressions based on the categorical and dimensional models of affect. Compared to ResNet, our proposed adaptive complex mapping results in a shallower network with less numbers of training parameters and floating point operations per second (FLOPs). Adding trainable parameters to the bypass function further improves fitting and training the network and hence recognizing subtle facial expressions such as contempt with a higher accuracy. We conducted comprehensive experiments on the categorical and dimensional models of affect on the challenging in-the-wild databases of AffectNet, FER2013, and Affect-in-Wild. Our experimental results show that our adaptive complex mapping approach outperforms the original ResNet consisting of a simple identity mapping as well as other state-of-the-art methods for Facial Expression Recognition (FER). Various metrics are reported in both affect models to provide a comprehensive evaluation of our method. In the categorical model, BReG-NeXt-50 with only 3.1M training parameters and 15 MFLOPs, achieves 68.50% and 71.53% accuracy on AffectNet and FER2013 databases, respectively. In the dimensional model, BReG-NeXt achieves 0.2577 and 0.2882 RMSE value on AffectNet and Affect-in-Wild databases, respectively.

* To appear in IEEE Transactions on Affective Computing journal 

Video-based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms

Feb 16, 2022
Xianye Ben, Yi Ren, Junping Zhang, Su-Jing Wang, Kidiyo Kpalma, Weixiao Meng, Yong-Jin Liu

Unlike the conventional facial expressions, micro-expressions are involuntary and transient facial expressions capable of revealing the genuine emotions that people attempt to hide. Therefore, they can provide important information in a broad range of applications such as lie detection, criminal detection, etc. Since micro-expressions are transient and of low intensity, however, their detection and recognition is difficult and relies heavily on expert experiences. Due to its intrinsic particularity and complexity, video-based micro-expression analysis is attractive but challenging, and has recently become an active area of research. Although there have been numerous developments in this area, thus far there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences between macro- and micro-expressions, then use these differences to guide our research survey of video-based micro-expression analysis in a cascaded structure, encompassing the neuropsychological basis, datasets, features, spotting algorithms, recognition algorithms, applications and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are addressed and discussed. Furthermore, after considering the limitations of existing micro-expression datasets, we present and release a new dataset - called micro-and-macro expression warehouse (MMEW) - containing more video samples and more labeled emotion types. We then perform a unified comparison of representative methods on CAS(ME)2 for spotting, and on MMEW and SAMM for recognition, respectively. Finally, some potential future research directions are explored and outlined.