Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:facial recognition

What is facial recognition? Facial recognition is an AI-based technique for identifying or confirming an individual's identity using their face. It maps facial features from an image or video and then compares the information with a collection of known faces to find a match.

MSConv: Multiplicative and Subtractive Convolution for Face Recognition

Mar 08, 2025

Si Zhou, Yain-Whar Si, Xiaochen Yuan, Xiaofan Li, Xiaoxiang Liu, Xinyuan Zhang, Cong Lin, Xueyuan Gong

Abstract:In Neural Networks, there are various methods of feature fusion. Different strategies can significantly affect the effectiveness of feature representation, consequently influencing the ability of model to extract representative and discriminative features. In the field of face recognition, traditional feature fusion methods include feature concatenation and feature addition. Recently, various attention mechanism-based fusion strategies have emerged. However, we found that these methods primarily focus on the important features in the image, referred to as salient features in this paper, while neglecting another equally important set of features for image recognition tasks, which we term differential features. This may cause the model to overlook critical local differences when dealing with complex facial samples. Therefore, in this paper, we propose an efficient convolution module called MSConv (Multiplicative and Subtractive Convolution), designed to balance the learning of model about salient and differential features. Specifically, we employ multi-scale mixed convolution to capture both local and broader contextual information from face images, and then utilize Multiplication Operation (MO) and Subtraction Operation (SO) to extract salient and differential features, respectively. Experimental results demonstrate that by integrating both salient and differential features, MSConv outperforms models that only focus on salient features.

Via

Access Paper or Ask Questions

CMED: A Child Micro-Expression Dataset

Mar 27, 2025

Nikin~Matharaarachchi, Muhammad~Fermi Pasha, Sonya~Coleman, Kah PengWong

Abstract:Micro-expressions are short bursts of emotion that are difficult to hide. Their detection in children is an important cue to assist psychotherapists in conducting better therapy. However, existing research on the detection of micro-expressions has focused on adults, whose expressions differ in their characteristics from those of children. The lack of research is a direct consequence of the lack of a child-based micro-expressions dataset as it is much more challenging to capture children's facial expressions due to the lack of predictability and controllability. This study compiles a dataset of spontaneous child micro-expression videos, the first of its kind, to the best of the authors knowledge. The dataset is captured in the wild using video conferencing software. This dataset enables us to then explore key features and differences between adult and child micro-expressions. This study also establishes a baseline for the automated spotting and recognition of micro-expressions in children using three approaches comprising of hand-created and learning-based approaches.

Via

Access Paper or Ask Questions

Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios

Mar 14, 2025

Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shuo Chen, Jian Yang

Abstract:Physiological activities can be manifested by the sensitive changes in facial imaging. While they are barely observable to our eyes, computer vision manners can, and the derived remote photoplethysmography (rPPG) has shown considerable promise. However, existing studies mainly rely on spatial skin recognition and temporal rhythmic interactions, so they focus on identifying explicit features under ideal light conditions, but perform poorly in-the-wild with intricate obstacles and extreme illumination exposure. In this paper, we propose an end-to-end video transformer model for rPPG. It strives to eliminate complex and unknown external time-varying interferences, whether they are sufficient to occupy subtle biosignal amplitudes or exist as periodic perturbations that hinder network training. In the specific implementation, we utilize global interference sharing, subject background reference, and self-supervised disentanglement to eliminate interference, and further guide learning based on spatiotemporal filtering, reconstruction guidance, and frequency domain and biological prior constraints to achieve effective rPPG. To the best of our knowledge, this is the first robust rPPG model for real outdoor scenarios based on natural face videos, and is lightweight to deploy. Extensive experiments show the competitiveness and performance of our model in rPPG prediction across datasets and scenes.

Via

Access Paper or Ask Questions

Driver Assistance System Based on Multimodal Data Hazard Detection

Feb 05, 2025

Long Zhouxiang, Ovanes Petrosian

Figure 1 for Driver Assistance System Based on Multimodal Data Hazard Detection

Figure 2 for Driver Assistance System Based on Multimodal Data Hazard Detection

Figure 3 for Driver Assistance System Based on Multimodal Data Hazard Detection

Figure 4 for Driver Assistance System Based on Multimodal Data Hazard Detection

Abstract:Autonomous driving technology has advanced significantly, yet detecting driving anomalies remains a major challenge due to the long-tailed distribution of driving events. Existing methods primarily rely on single-modal road condition video data, which limits their ability to capture rare and unpredictable driving incidents. This paper proposes a multimodal driver assistance detection system that integrates road condition video, driver facial video, and audio data to enhance incident recognition accuracy. Our model employs an attention-based intermediate fusion strategy, enabling end-to-end learning without separate feature extraction. To support this approach, we develop a new three-modality dataset using a driving simulator. Experimental results demonstrate that our method effectively captures cross-modal correlations, reducing misjudgments and improving driving safety.

Via

Access Paper or Ask Questions

MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

Jan 06, 2025

Valeriya Strizhkova, Hadi Kachmar, Hava Chaptoukaev, Raphael Kalandadze, Natia Kukhilava, Tatia Tsmindashvili, Nibras Abo-Alzahab, Maria A. Zuluaga, Michal Balazia, Antitza Dantcheva(+2 more)

Figure 1 for MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

Figure 2 for MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

Figure 3 for MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

Figure 4 for MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

Abstract:Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differently then others approaches, MVP exploits the benefits of attention to enable the use of long input sequences (1-2 minutes). We have studied video and physiological backbones for inputting long sequences and evaluated our method with respect to the state-of-the-art. Our results show that MVP outperforms former methods for emotion recognition based on facial videos, EDA, and ECG/PPG.

* Preprint. Final paper accepted at Affective Behavior Analysis in-the-Wild (ABAW) at IEEE/CVF European Conference on Computer Vision (ECCV), Milan, September, 2024. 17 pages

Via

Access Paper or Ask Questions

Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints

Feb 28, 2025

Masoumeh Chapariniya, Hossein Ranjbar, Teodora Vukovic, Sarah Ebling, Volker Dellwo

Figure 1 for Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints

Figure 2 for Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints

Figure 3 for Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints

Figure 4 for Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints

Abstract:In the age of AI-driven generative technologies, traditional biometric recognition systems face unprecedented challenges, particularly from sophisticated deepfake and face reenactment techniques. In this study, we propose a Two-Stream Spatial-Temporal Transformer Framework for person identification using upper body keypoints visible during online conversations, which we term conversational keypoints. Our framework processes both spatial relationships between keypoints and their temporal evolution through two specialized branches: a Spatial Transformer (STR) that learns distinctive structural patterns in keypoint configurations, and a Temporal Transformer (TTR) that captures sequential motion patterns. Using the state-of-the-art Sapiens pose estimator, we extract 133 keypoints (based on COCO-WholeBody format) representing facial features, head pose, and hand positions. The framework was evaluated on a dataset of 114 individuals engaged in natural conversations, achieving recognition accuracies of 80.12% for the spatial stream, 63.61% for the temporal stream. We then explored two fusion strategies: a shared loss function approach achieving 82.22% accuracy, and a feature-level fusion method that concatenates feature maps from both streams, significantly improving performance to 94.86%. By jointly modeling both static anatomical relationships and dynamic movement patterns, our approach learns comprehensive identity signatures that are more robust to spoofing than traditional appearance-based methods.

Via

Access Paper or Ask Questions

DIPSER: A Dataset for In-Person Student1 Engagement Recognition in the Wild

Feb 27, 2025

Luis Marquez-Carpintero, Sergio Suescun-Ferrandiz, Carolina Lorenzo Álvarez, Jorge Fernandez-Herrero, Diego Viejo, Rosabel Roig-Vila, Miguel Cazorla

Abstract:In this paper, a novel dataset is introduced, designed to assess student attention within in-person classroom settings. This dataset encompasses RGB camera data, featuring multiple cameras per student to capture both posture and facial expressions, in addition to smartwatch sensor data for each individual. This dataset allows machine learning algorithms to be trained to predict attention and correlate it with emotion. A comprehensive suite of attention and emotion labels for each student is provided, generated through self-reporting as well as evaluations by four different experts. Our dataset uniquely combines facial and environmental camera data, smartwatch metrics, and includes underrepresented ethnicities in similar datasets, all within in-the-wild, in-person settings, making it the most comprehensive dataset of its kind currently available. The dataset presented offers an extensive and diverse collection of data pertaining to student interactions across different educational contexts, augmented with additional metadata from other tools. This initiative addresses existing deficiencies by offering a valuable resource for the analysis of student attention and emotion in face-to-face lessons.

Via

Access Paper or Ask Questions

Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain

Jan 28, 2025

Marco Huber, Fadi Boutros, Naser Damer

Figure 1 for Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain

Figure 2 for Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain

Figure 3 for Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain

Figure 4 for Frequency Matters: Explaining Biases of Face Recognition in the Frequency Domain

Abstract:Face recognition (FR) models are vulnerable to performance variations across demographic groups. The causes for these performance differences are unclear due to the highly complex deep learning-based structure of face recognition models. Several works aimed at exploring possible roots of gender and ethnicity bias, identifying semantic reasons such as hairstyle, make-up, or facial hair as possible sources. Motivated by recent discoveries of the importance of frequency patterns in convolutional neural networks, we explain bias in face recognition using state-of-the-art frequency-based explanations. Our extensive results show that different frequencies are important to FR models depending on the ethnicity of the samples.

* Accepted at xAI4Biometrics at ECCV 2024

Via

Access Paper or Ask Questions

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping

Mar 24, 2025

Tianyi Wang, Harry Cheng, Xiao Zhang, Yinglong Wang

Abstract:Suffering from performance bottlenecks in passively detecting high-quality Deepfake images due to the advancement of generative models, proactive perturbations offer a promising approach to disabling Deepfake manipulations by inserting signals into benign images. However, existing proactive perturbation approaches remain unsatisfactory in several aspects: 1) visual degradation due to direct element-wise addition; 2) limited effectiveness against face swapping manipulation; 3) unavoidable reliance on white- and grey-box settings to involve generative models during training. In this study, we analyze the essence of Deepfake face swapping and argue the necessity of protecting source identities rather than target images, and we propose NullSwap, a novel proactive defense approach that cloaks source image identities and nullifies face swapping under a pure black-box scenario. We design an Identity Extraction module to obtain facial identity features from the source image, while a Perturbation Block is then devised to generate identity-guided perturbations accordingly. Meanwhile, a Feature Block extracts shallow-level image features, which are then fused with the perturbation in the Cloaking Block for image reconstruction. Furthermore, to ensure adaptability across different identity extractors in face swapping algorithms, we propose Dynamic Loss Weighting to adaptively balance identity losses. Experiments demonstrate the outstanding ability of our approach to fool various identity recognition models, outperforming state-of-the-art proactive perturbations in preventing face swapping models from generating images with correct source identities.

Via

Access Paper or Ask Questions

CG-MER: A Card Game-based Multimodal dataset for Emotion Recognition

Jan 14, 2025

Nessrine Farhat, Amine Bohi, Leila Ben Letaifa, Rim Slama

Abstract:The field of affective computing has seen significant advancements in exploring the relationship between emotions and emerging technologies. This paper presents a novel and valuable contribution to this field with the introduction of a comprehensive French multimodal dataset designed specifically for emotion recognition. The dataset encompasses three primary modalities: facial expressions, speech, and gestures, providing a holistic perspective on emotions. Moreover, the dataset has the potential to incorporate additional modalities, such as Natural Language Processing (NLP) to expand the scope of emotion recognition research. The dataset was curated through engaging participants in card game sessions, where they were prompted to express a range of emotions while responding to diverse questions. The study included 10 sessions with 20 participants (9 females and 11 males). The dataset serves as a valuable resource for furthering research in emotion recognition and provides an avenue for exploring the intricate connections between human emotions and digital technologies.

* 8 pages, 2 figures and 4 tables. Sixteenth International Conference on Machine Vision (ICMV 2023), Yerevan, Armenia

Via

Access Paper or Ask Questions

Topic:facial recognition

Papers and Code