Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kevin Hernandez-Diaz

Leveraging Large-Scale Face Datasets for Deep Periocular Recognition via Ocular Cropping

Oct 30, 2025

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun

Abstract:We focus on ocular biometrics, specifically the periocular region (the area around the eye), which offers high discrimination and minimal acquisition constraints. We evaluate three Convolutional Neural Network architectures of varying depth and complexity to assess their effectiveness for periocular recognition. The networks are trained on 1,907,572 ocular crops extracted from the large-scale VGGFace2 database. This significantly contrasts with existing works, which typically rely on small-scale periocular datasets for training having only a few thousand images. Experiments are conducted with ocular images from VGGFace2-Pose, a subset of VGGFace2 containing in-the-wild face images, and the UFPR-Periocular database, which consists of selfies captured via mobile devices with user guidance on the screen. Due to the uncontrolled conditions of VGGFace2, the Equal Error Rates (EERs) obtained with ocular crops range from 9-15%, noticeably higher than the 3-6% EERs achieved using full-face images. In contrast, UFPR-Periocular yields significantly better performance (EERs of 1-2%), thanks to higher image quality and more consistent acquisition protocols. To the best of our knowledge, these are the lowest reported EERs on the UFPR dataset to date.

* Published at IWAIPR 2025 conference

Via

Access Paper or Ask Questions

Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition

Jul 28, 2024

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Prayag Tiwari, Josef Bigun

Figure 1 for Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition

Figure 2 for Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition

Figure 3 for Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition

Figure 4 for Combined CNN and ViT features off-the-shelf: Another astounding baseline for recognition

Abstract:We apply pre-trained architectures, originally developed for the ImageNet Large Scale Visual Recognition Challenge, for periocular recognition. These architectures have demonstrated significant success in various computer vision tasks beyond the ones for which they were designed. This work builds on our previous study using off-the-shelf Convolutional Neural Network (CNN) and extends it to include the more recently proposed Vision Transformers (ViT). Despite being trained for generic object classification, middle-layer features from CNNs and ViTs are a suitable way to recognize individuals based on periocular images. We also demonstrate that CNNs and ViTs are highly complementary since their combination results in boosted accuracy. In addition, we show that a small portion of these pre-trained models can achieve good accuracy, resulting in thinner models with fewer parameters, suitable for resource-limited environments such as mobiles. This efficiency improves if traditional handcrafted features are added as well.

* Under consideration at WIFS 2024

Via

Access Paper or Ask Questions

Deep Network Pruning: A Comparative Study on CNNs in Face Recognition

May 28, 2024

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Prayag Tiwari, Josef Bigun

Figure 1 for Deep Network Pruning: A Comparative Study on CNNs in Face Recognition

Figure 2 for Deep Network Pruning: A Comparative Study on CNNs in Face Recognition

Figure 3 for Deep Network Pruning: A Comparative Study on CNNs in Face Recognition

Figure 4 for Deep Network Pruning: A Comparative Study on CNNs in Face Recognition

Abstract:The widespread use of mobile devices for all kind of transactions makes necessary reliable and real-time identity authentication, leading to the adoption of face recognition (FR) via the cameras embedded in such devices. Progress of deep Convolutional Neural Networks (CNNs) has provided substantial advances in FR. Nonetheless, the size of state-of-the-art architectures is unsuitable for mobile deployment, since they often encompass hundreds of megabytes and millions of parameters. We address this by studying methods for deep network compression applied to FR. In particular, we apply network pruning based on Taylor scores, where less important filters are removed iteratively. The method is tested on three networks based on the small SqueezeNet (1.24M parameters) and the popular MobileNetv2 (3.5M) and ResNet50 (23.5M) architectures. These have been selected to showcase the method on CNNs with different complexities and sizes. We observe that a substantial percentage of filters can be removed with minimal performance loss. Also, filters with the highest amount of output channels tend to be removed first, suggesting that high-dimensional spaces within popular CNNs are over-dimensionated.

* Submitted to Pattern Recognition Letters

Via

Access Paper or Ask Questions

Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Apr 24, 2024

Kevin Hernandez-Diaz, Josef Bigun, Fernando Alonso-Fernandez

Figure 1 for Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Figure 2 for Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Figure 3 for Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Figure 4 for Understanding and Improving CNNs with Complex Structure Tensor: A Biometrics Study

Abstract:Our study provides evidence that CNNs struggle to effectively extract orientation features. We show that the use of Complex Structure Tensor, which contains compact orientation features with certainties, as input to CNNs consistently improves identification accuracy compared to using grayscale inputs alone. Experiments also demonstrated that our inputs, which were provided by mini complex conv-nets, combined with reduced CNN sizes, outperformed full-fledged, prevailing CNN architectures. This suggests that the upfront use of orientation features in CNNs, a strategy seen in mammalian vision, not only mitigates their limitations but also enhances their explainability and relevance to thin-clients. Experiments were done on publicly available data sets comprising periocular images for biometric identification and verification (Close and Open World) using 6 State of the Art CNN architectures. We reduced SOA Equal Error Rate (EER) on the PolyU dataset by 5-26% depending on data and scenario.

* preprint manuscript

Via

Access Paper or Ask Questions

An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification

Jul 25, 2023

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose M. Buades, Prayag Tiwari, Josef Bigun

Figure 1 for An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification

Figure 2 for An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification

Figure 3 for An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification

Figure 4 for An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification

Abstract:This paper describes an adaptation of the Local Interpretable Model-Agnostic Explanations (LIME) AI method to operate under a biometric verification setting. LIME was initially proposed for networks with the same output classes used for training, and it employs the softmax probability to determine which regions of the image contribute the most to classification. However, in a verification setting, the classes to be recognized have not been seen during training. In addition, instead of using the softmax output, face descriptors are usually obtained from a layer before the classification layer. The model is adapted to achieve explainability via cosine similarity between feature vectors of perturbated versions of the input image. The method is showcased for face biometrics with two CNN models based on MobileNetv2 and ResNet50.

Via

Access Paper or Ask Questions

SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning

Jul 20, 2023

Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun

Abstract:The widespread use of mobile devices for various digital services has created a need for reliable and real-time person authentication. In this context, facial recognition technologies have emerged as a dependable method for verifying users due to the prevalence of cameras in mobile devices and their integration into everyday applications. The rapid advancement of deep Convolutional Neural Networks (CNNs) has led to numerous face verification architectures. However, these models are often large and impractical for mobile applications, reaching sizes of hundreds of megabytes with millions of parameters. We address this issue by developing SqueezerFaceNet, a light face recognition network which less than 1M parameters. This is achieved by applying a network pruning method based on Taylor scores, where filters with small importance scores are removed iteratively. Starting from an already small network (of 1.24M) based on SqueezeNet, we show that it can be further reduced (up to 40%) without an appreciable loss in performance. To the best of our knowledge, we are the first to evaluate network pruning methods for the task of face recognition.

* Published at VIII International Workshop on Artificial Intelligence and Pattern Recognition, IWAIPR 2023

Via

Access Paper or Ask Questions

One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations

Jul 11, 2023

Kevin Hernandez-Diaz, Fernando Alonso-Fernandez, Josef Bigun

Figure 1 for One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations

Figure 2 for One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations

Figure 3 for One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations

Figure 4 for One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations

Abstract:One weakness of machine-learning algorithms is the need to train the models for a new task. This presents a specific challenge for biometric recognition due to the dynamic nature of databases and, in some instances, the reliance on subject collaboration for data collection. In this paper, we investigate the behavior of deep representations in widely used CNN models under extreme data scarcity for One-Shot periocular recognition, a biometric recognition task. We analyze the outputs of CNN layers as identity-representing feature vectors. We examine the impact of Domain Adaptation on the network layers' output for unseen data and evaluate the method's robustness concerning data normalization and generalization of the best-performing layer. We improved state-of-the-art results that made use of networks trained with biometric datasets with millions of images and fine-tuned for the target periocular dataset by utilizing out-of-the-box CNNs trained for the ImageNet Recognition Challenge and standard computer vision algorithms. For example, for the Cross-Eyed dataset, we could reduce the EER by 67% and 79% (from 1.70% and 3.41% to 0.56% and 0.71%) in the Close-World and Open-World protocols, respectively, for the periocular case. We also demonstrate that traditional algorithms like SIFT can outperform CNNs in situations with limited data or scenarios where the network has not been trained with the test classes like the Open-World mode. SIFT alone was able to reduce the EER by 64% and 71.6% (from 1.7% and 3.41% to 0.6% and 0.97%) for Cross-Eyed in the Close-World and Open-World protocols, respectively, and a reduction of 4.6% (from 3.94% to 3.76%) in the PolyU database for the Open-World and single biometric case.

* Submitted preprint to IEE Access

Via

Access Paper or Ask Questions

Image-Based Fire Detection in Industrial Environments with YOLOv4

Dec 09, 2022

Otto Zell, Joel Pålsson, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez, Felix Nilsson

Abstract:Fires have destructive power when they break out and affect their surroundings on a devastatingly large scale. The best way to minimize their damage is to detect the fire as quickly as possible before it has a chance to grow. Accordingly, this work looks into the potential of AI to detect and recognize fires and reduce detection time using object detection on an image stream. Object detection has made giant leaps in speed and accuracy over the last six years, making real-time detection feasible. To our end, we collected and labeled appropriate data from several public sources, which have been used to train and evaluate several models based on the popular YOLOv4 object detector. Our focus, driven by a collaborating industrial partner, is to implement our system in an industrial warehouse setting, which is characterized by high ceilings. A drawback of traditional smoke detectors in this setup is that the smoke has to rise to a sufficient height. The AI models brought forward in this research managed to outperform these detectors by a significant amount of time, providing precious anticipation that could help to minimize the effects of fires further.

* Accepted for publication at ICPRAM

Via

Access Paper or Ask Questions

Synthetic Data for Object Classification in Industrial Applications

Dec 09, 2022

August Baaz, Yonan Yonan, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez, Felix Nilsson

Figure 1 for Synthetic Data for Object Classification in Industrial Applications

Figure 2 for Synthetic Data for Object Classification in Industrial Applications

Figure 3 for Synthetic Data for Object Classification in Industrial Applications

Figure 4 for Synthetic Data for Object Classification in Industrial Applications

Abstract:One of the biggest challenges in machine learning is data collection. Training data is an important part since it determines how the model will behave. In object classification, capturing a large number of images per object and in different conditions is not always possible and can be very time-consuming and tedious. Accordingly, this work explores the creation of artificial images using a game engine to cope with limited data in the training dataset. We combine real and synthetic data to train the object classification engine, a strategy that has shown to be beneficial to increase confidence in the decisions made by the classifier, which is often critical in industrial setups. To combine real and synthetic data, we first train the classifier on a massive amount of synthetic data, and then we fine-tune it on real images. Another important result is that the amount of real images needed for fine-tuning is not very high, reaching top accuracy with just 12 or 24 images per class. This substantially reduces the requirements of capturing a great amount of real data.

* Accepted for publication at ICPRAM

Via

Access Paper or Ask Questions

Visual Detection of Personal Protective Equipment and Safety Gear on Industry Workers

Dec 09, 2022

Jonathan Karlsson, Fredrik Strand, Josef Bigun, Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Felix Nilsson

Figure 1 for Visual Detection of Personal Protective Equipment and Safety Gear on Industry Workers

Figure 2 for Visual Detection of Personal Protective Equipment and Safety Gear on Industry Workers

Figure 3 for Visual Detection of Personal Protective Equipment and Safety Gear on Industry Workers

Figure 4 for Visual Detection of Personal Protective Equipment and Safety Gear on Industry Workers

Abstract:Workplace injuries are common in today's society due to a lack of adequately worn safety equipment. A system that only admits appropriately equipped personnel can be created to improve working conditions. The goal is thus to develop a system that will improve workers' safety using a camera that will detect the usage of Personal Protective Equipment (PPE). To this end, we collected and labeled appropriate data from several public sources, which have been used to train and evaluate several models based on the popular YOLOv4 object detector. Our focus, driven by a collaborating industrial partner, is to implement our system into an entry control point where workers must present themselves to obtain access to a restricted area. Combined with facial identity recognition, the system would ensure that only authorized people wearing appropriate equipment are granted access. A novelty of this work is that we increase the number of classes to five objects (hardhat, safety vest, safety gloves, safety glasses, and hearing protection), whereas most existing works only focus on one or two classes, usually hardhats or vests. The AI model developed provides good detection accuracy at a distance of 3 and 5 meters in the collaborative environment where we aim at operating (mAP of 99/89%, respectively). The small size of some objects or the potential occlusion by body parts have been identified as potential factors that are detrimental to accuracy, which we have counteracted via data augmentation and cropping of the body before applying PPE detection.

* Accepted for publication at ICPRAM

Via

Access Paper or Ask Questions