Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:facial recognition

What is facial recognition? Facial recognition is an AI-based technique for identifying or confirming an individual's identity using their face. It maps facial features from an image or video and then compares the information with a collection of known faces to find a match.

Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

Nov 28, 2024

Xuehao Cui, Guangyang Wu, Zhenghao Gan, Guangtao Zhai, Xiaohong Liu

Figure 1 for Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

Figure 2 for Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

Figure 3 for Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

Figure 4 for Face2QR: A Unified Framework for Aesthetic, Face-Preserving, and Scannable QR Code Generation

Abstract:Existing methods to generate aesthetic QR codes, such as image and style transfer techniques, tend to compromise either the visual appeal or the scannability of QR codes when they incorporate human face identity. Addressing these imperfections, we present Face2QR-a novel pipeline specifically designed for generating personalized QR codes that harmoniously blend aesthetics, face identity, and scannability. Our pipeline introduces three innovative components. First, the ID-refined QR integration (IDQR) seamlessly intertwines the background styling with face ID, utilizing a unified Stable Diffusion (SD)-based framework with control networks. Second, the ID-aware QR ReShuffle (IDRS) effectively rectifies the conflicts between face IDs and QR patterns, rearranging QR modules to maintain the integrity of facial features without compromising scannability. Lastly, the ID-preserved Scannability Enhancement (IDSE) markedly boosts scanning robustness through latent code optimization, striking a delicate balance between face ID, aesthetic quality and QR functionality. In comprehensive experiments, Face2QR demonstrates remarkable performance, outperforming existing approaches, particularly in preserving facial recognition features within custom QR code designs. Codes are available at $\href{https://github.com/cavosamir/Face2QR}{\text{this URL link}}$.

Via

Access Paper or Ask Questions

7ABAW-Compound Expression Recognition via Curriculum Learning

Mar 11, 2025

Chen Liu, Feng Qiu, Wei Zhang, Lincheng Li, Dadong Wang, Xin Yu

Abstract:With the advent of deep learning, expression recognition has made significant advancements. However, due to the limited availability of annotated compound expression datasets and the subtle variations of compound expressions, Compound Emotion Recognition (CE) still holds considerable potential for exploration. To advance this task, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition introduces the Compound Expression Challenge based on C-EXPR-DB, a limited dataset without labels. In this paper, we present a curriculum learning-based framework that initially trains the model on single-expression tasks and subsequently incorporates multi-expression data. This design ensures that our model first masters the fundamental features of basic expressions before being exposed to the complexities of compound emotions. Specifically, our designs can be summarized as follows: 1) Single-Expression Pre-training: The model is first trained on datasets containing single expressions to learn the foundational facial features associated with basic emotions. 2) Dynamic Compound Expression Generation: Given the scarcity of annotated compound expression datasets, we employ CutMix and Mixup techniques on the original single-expression images to create hybrid images exhibiting characteristics of multiple basic emotions. 3) Incremental Multi-Expression Integration: After performing well on single-expression tasks, the model is progressively exposed to multi-expression data, allowing the model to adapt to the complexity and variability of compound expressions. The official results indicate that our method achieves the \textbf{best} performance in this competition track with an F-score of 0.6063. Our code is released at https://github.com/YenanLiu/ABAW7th.

* Accepted by ECCVWorkshop as the report of the first place in 7th ABAW Track2 Competition

Via

Access Paper or Ask Questions

Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Dec 17, 2024

Xingjian Wang, Li Chai

Figure 1 for Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Figure 2 for Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Figure 3 for Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Figure 4 for Lifting Scheme-Based Implicit Disentanglement of Emotion-Related Facial Dynamics in the Wild

Abstract:In-the-wild Dynamic facial expression recognition (DFER) encounters a significant challenge in recognizing emotion-related expressions, which are often temporally and spatially diluted by emotion-irrelevant expressions and global context respectively. Most of the prior DFER methods model tightly coupled spatiotemporal representations which may incorporate weakly relevant features, leading to information redundancy and emotion-irrelevant context bias. Several DFER methods have highlighted the significance of dynamic information, but utilize explicit manners to extract dynamic features with overly strong prior knowledge. In this paper, we propose a novel Implicit Facial Dynamics Disentanglement framework (IFDD). Through expanding wavelet lifting scheme to fully learnable framework, IFDD disentangles emotion-related dynamic information from emotion-irrelevant global context in an implicit manner, i.e., without exploit operations and external guidance. The disentanglement process of IFDD contains two stages, i.e., Inter-frame Static-dynamic Splitting Module (ISSM) for rough disentanglement estimation and Lifting-based Aggregation-Disentanglement Module (LADM) for further refinement. Specifically, ISSM explores inter-frame correlation to generate content-aware splitting indexes on-the-fly. We preliminarily utilize these indexes to split frame features into two groups, one with greater global similarity, and the other with more unique dynamic features. Subsequently, LADM first aggregates these two groups of features to obtain fine-grained global context features by an updater, and then disentangles emotion-related facial dynamic features from the global context by a predictor. Extensive experiments on in-the-wild datasets have demonstrated that IFDD outperforms prior supervised DFER methods with higher recognition accuracy and comparable efficiency.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

PERCY: A Multimodal Dataset and Conversational System for Personalized and Emotionally Aware Human-Robot Interaction

Dec 06, 2024

Mohammed Althubyani, Zhijin Meng, Shengyuan Xie, Cha Seung, Imran Razzak, Eduardo Benitez Sandoval, Baki Kocaballi, Mahdi Bamdad, Francisco Cruz Naranjo

Figure 1 for PERCY: A Multimodal Dataset and Conversational System for Personalized and Emotionally Aware Human-Robot Interaction

Figure 2 for PERCY: A Multimodal Dataset and Conversational System for Personalized and Emotionally Aware Human-Robot Interaction

Figure 3 for PERCY: A Multimodal Dataset and Conversational System for Personalized and Emotionally Aware Human-Robot Interaction

Figure 4 for PERCY: A Multimodal Dataset and Conversational System for Personalized and Emotionally Aware Human-Robot Interaction

Abstract:The integration of conversational agents into our daily lives has become increasingly common, yet many of these agents cannot engage in deep interactions with humans. Despite this, there is a noticeable shortage of datasets that capture multimodal information from human-robot interaction dialogues. To address this gap, we have developed a Personal Emotional Robotic Conversational sYstem (PERCY) and recorded a novel multimodal dataset that encompasses rich embodied interaction data. The process involved asking participants to complete a questionnaire and gathering their profiles on ten topics, such as hobbies and favourite music. Subsequently, we initiated conversations between the robot and the participants, leveraging GPT-4 to generate contextually appropriate responses based on the participant's profile and emotional state, as determined by facial expression recognition and sentiment analysis. Automatic and user evaluations were conducted to assess the overall quality of the collected data. The results of both evaluations indicated a high level of naturalness, engagement, fluency, consistency, and relevance in the conversation, as well as the robot's ability to provide empathetic responses. It is worth noting that the dataset is derived from genuine interactions with the robot, involving participants who provided personal information and conveyed actual emotions.

* 9 pages, 5 Figures, Rejected from International Conference of Human Robot Interaction 2025, Melbourne, Australia

Via

Access Paper or Ask Questions

FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction

Mar 04, 2025

Yanghai Zhang, Changyi Liu, Keting Fu, Wenbin Zhou, Qingdu Li, Jianwei Zhang

Figure 1 for FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction

Figure 2 for FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction

Figure 3 for FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction

Figure 4 for FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction

Abstract:This paper proposes FABG (Facial Affective Behavior Generation), an end-to-end imitation learning system for human-robot interaction, designed to generate natural and fluid facial affective behaviors. In interaction, effectively obtaining high-quality demonstrations remains a challenge. In this work, we develop an immersive virtual reality (VR) demonstration system that allows operators to perceive stereoscopic environments. This system ensures "the operator's visual perception matches the robot's sensory input" and "the operator's actions directly determine the robot's behaviors" - as if the operator replaces the robot in human interaction engagements. We propose a prediction-driven latency compensation strategy to reduce robotic reaction delays and enhance interaction fluency. FABG naturally acquires human interactive behaviors and subconscious motions driven by intuition, eliminating manual behavior scripting. We deploy FABG on a real-world 25-degree-of-freedom (DoF) humanoid robot, validating its effectiveness through four fundamental interaction tasks: expression response, dynamic gaze, foveated attention, and gesture recognition, supported by data collection and policy training. Project website: https://cybergenies.github.io

* Project website: https://cybergenies.github.io

Via

Access Paper or Ask Questions

From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning

Jan 03, 2025

Haoyi Wang, Victor Sanchez, Chang-Tsun Li, Nathan Clarke

Figure 1 for From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning

Figure 2 for From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning

Figure 3 for From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning

Figure 4 for From Age Estimation to Age-Invariant Face Recognition: Generalized Age Feature Extraction Using Order-Enhanced Contrastive Learning

Abstract:Generalized age feature extraction is crucial for age-related facial analysis tasks, such as age estimation and age-invariant face recognition (AIFR). Despite the recent successes of models in homogeneous-dataset experiments, their performance drops significantly in cross-dataset evaluations. Most of these models fail to extract generalized age features as they only attempt to map extracted features with training age labels directly without explicitly modeling the natural progression of aging. In this paper, we propose Order-Enhanced Contrastive Learning (OrdCon), which aims to extract generalized age features to minimize the domain gap across different datasets and scenarios. OrdCon aligns the direction vector of two features with either the natural aging direction or its reverse to effectively model the aging process. The method also leverages metric learning which is incorporated with a novel soft proxy matching loss to ensure that features are positioned around the center of each age cluster with minimum intra-class variance. We demonstrate that our proposed method achieves comparable results to state-of-the-art methods on various benchmark datasets in homogeneous-dataset evaluations for both age estimation and AIFR. In cross-dataset experiments, our method reduces the mean absolute error by about 1.38 in average for age estimation task and boosts the average accuracy for AIFR by 1.87%.

Via

Access Paper or Ask Questions

KAN See Your Face

Nov 27, 2024

Dong Han, Yong Li, Joachim Denzler

Abstract:With the advancement of face reconstruction (FR) systems, privacy-preserving face recognition (PPFR) has gained popularity for its secure face recognition, enhanced facial privacy protection, and robustness to various attacks. Besides, specific models and algorithms are proposed for face embedding protection by mapping embeddings to a secure space. However, there is a lack of studies on investigating and evaluating the possibility of extracting face images from embeddings of those systems, especially for PPFR. In this work, we introduce the first approach to exploit Kolmogorov-Arnold Network (KAN) for conducting embedding-to-face attacks against state-of-the-art (SOTA) FR and PPFR systems. Face embedding mapping (FEM) models are proposed to learn the distribution mapping relation between the embeddings from the initial domain and target domain. In comparison with Multi-Layer Perceptrons (MLP), we provide two variants, FEM-KAN and FEM-MLP, for efficient non-linear embedding-to-embedding mapping in order to reconstruct realistic face images from the corresponding face embedding. To verify our methods, we conduct extensive experiments with various PPFR and FR models. We also measure reconstructed face images with different metrics to evaluate the image quality. Through comprehensive experiments, we demonstrate the effectiveness of FEMs in accurate embedding mapping and face reconstruction.

* 16 pages, 8 figures

Via

Access Paper or Ask Questions

LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection

Nov 18, 2024

Günel Jabbarlı, Murat Kurt

Figure 1 for LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection

Figure 2 for LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection

Figure 3 for LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection

Figure 4 for LightFFDNets: Lightweight Convolutional Neural Networks for Rapid Facial Forgery Detection

Abstract:Accurate and fast recognition of forgeries is an issue of great importance in the fields of artificial intelligence, image processing and object detection. Recognition of forgeries of facial imagery is the process of classifying and defining the faces in it by analyzing real-world facial images. This process is usually accomplished by extracting features from an image, using classifier algorithms, and correctly interpreting the results. Recognizing forgeries of facial imagery correctly can encounter many different challenges. For example, factors such as changing lighting conditions, viewing faces from different angles can affect recognition performance, and background complexity and perspective changes in facial images can make accurate recognition difficult. Despite these difficulties, significant progress has been made in the field of forgery detection. Deep learning algorithms, especially Convolutional Neural Networks (CNNs), have significantly improved forgery detection performance. This study focuses on image processing-based forgery detection using Fake-Vs-Real-Faces (Hard) [10] and 140k Real and Fake Faces [61] data sets. Both data sets consist of two classes containing real and fake facial images. In our study, two lightweight deep learning models are proposed to conduct forgery detection using these images. Additionally, 8 different pretrained CNN architectures were tested on both data sets and the results were compared with newly developed lightweight CNN models. It's shown that the proposed lightweight deep learning models have minimum number of layers. It's also shown that the proposed lightweight deep learning models detect forgeries of facial imagery accurately, and computationally efficiently. Although the data set consists only of face images, the developed models can also be used in other two-class object recognition problems.

* 13 pages, 6 figures, 10 tables

Via

Access Paper or Ask Questions

WEM-GAN: Wavelet transform based facial expression manipulation

Dec 03, 2024

Dongya Sun, Yunfei Hu, Xianzhe Zhang, Yingsong Hu

Figure 1 for WEM-GAN: Wavelet transform based facial expression manipulation

Figure 2 for WEM-GAN: Wavelet transform based facial expression manipulation

Figure 3 for WEM-GAN: Wavelet transform based facial expression manipulation

Figure 4 for WEM-GAN: Wavelet transform based facial expression manipulation

Abstract:Facial expression manipulation aims to change human facial expressions without affecting face recognition. In order to transform the facial expressions to target expressions, previous methods relied on expression labels to guide the manipulation process. However, these methods failed to preserve the details of facial features, which causes the weakening or the loss of identity information in the output image. In our work, we propose WEM-GAN, in short for wavelet-based expression manipulation GAN, which puts more efforts on preserving the details of the original image in the editing process. Firstly, we take advantage of the wavelet transform technique and combine it with our generator with a U-net autoencoder backbone, in order to improve the generator's ability to preserve more details of facial features. Secondly, we also implement the high-frequency component discriminator, and use high-frequency domain adversarial loss to further constrain the optimization of our model, providing the generated face image with more abundant details. Additionally, in order to narrow the gap between generated facial expressions and target expressions, we use residual connections between encoder and decoder, while also using relative action units (AUs) several times. Extensive qualitative and quantitative experiments have demonstrated that our model performs better in preserving identity features, editing capability, and image generation quality on the AffectNet dataset. It also shows superior performance in metrics such as Average Content Distance (ACD) and Expression Distance (ED).

Via

Access Paper or Ask Questions

Facial Analysis Systems and Down Syndrome

Feb 10, 2025

Marco Rondina, Fabiana Vinci, Antonio Vetrò, Juan Carlos De Martin

Abstract:The ethical, social and legal issues surrounding facial analysis technologies have been widely debated in recent years. Key critics have argued that these technologies can perpetuate bias and discrimination, particularly against marginalized groups. We contribute to this field of research by reporting on the limitations of facial analysis systems with the faces of people with Down syndrome: this particularly vulnerable group has received very little attention in the literature so far. This study involved the creation of a specific dataset of face images. An experimental group with faces of people with Down syndrome, and a control group with faces of people who are not affected by the syndrome. Two commercial tools were tested on the dataset, along three tasks: gender recognition, age prediction and face labelling. The results show an overall lower accuracy of prediction in the experimental group, and other specific patterns of performance differences: i) high error rates in gender recognition in the category of males with Down syndrome; ii) adults with Down syndrome were more often incorrectly labelled as children; iii) social stereotypes are propagated in both the control and experimental groups, with labels related to aesthetics more often associated with women, and labels related to education level and skills more often associated with men. These results, although limited in scope, shed new light on the biases that alter face classification when applied to faces of people with Down syndrome. They confirm the structural limitation of the technology, which is inherently dependent on the datasets used to train the models.

* Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2133. Springer, Cham

Via

Access Paper or Ask Questions

Topic:facial recognition

Papers and Code