Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Temporally Coherent Person Matting Trained on Fake-Motion Dataset

Sep 10, 2021
Ivan Molodetskikh, Mikhail Erofeev, Andrey Moskalenko, Dmitry Vatolin

We propose a novel neural-network-based method to perform matting of videos depicting people that does not require additional user input such as trimaps. Our architecture achieves temporal stability of the resulting alpha mattes by using motion-estimation-based smoothing of image-segmentation algorithm outputs, combined with convolutional-LSTM modules on U-Net skip connections. We also propose a fake-motion algorithm that generates training clips for the video-matting network given photos with ground-truth alpha mattes and background videos. We apply random motion to photos and their mattes to simulate movement one would find in real videos and composite the result with the background clips. It lets us train a deep neural network operating on videos in an absence of a large annotated video dataset and provides ground-truth training-clip foreground optical flow for use in loss functions.

* 13 pages, 5 figures 

Effective Face Frontalization in Unconstrained Images

Nov 28, 2014
Tal Hassner, Shai Harel, Eran Paz, Roee Enbar

"Frontalization" is the process of synthesizing frontal facing views of faces appearing in single unconstrained photos. Recent reports have suggested that this process may substantially boost the performance of face recognition systems. This, by transforming the challenging problem of recognizing faces viewed from unconstrained viewpoints to the easier problem of recognizing faces in constrained, forward facing poses. Previous frontalization methods did this by attempting to approximate 3D facial shapes for each query image. We observe that 3D face shape estimation from unconstrained photos may be a harder problem than frontalization and can potentially introduce facial misalignments. Instead, we explore the simpler approach of using a single, unmodified, 3D surface as an approximation to the shape of all input faces. We show that this leads to a straightforward, efficient and easy to implement method for frontalization. More importantly, it produces aesthetic new frontal views and is surprisingly effective when used for face recognition and gender estimation.


MegaFace: A Million Faces for Recognition at Scale

Sep 07, 2015
D. Miller, E. Brossard, S. Seitz, I. Kemelmacher-Shlizerman

Recent face recognition experiments on the LFW benchmark show that face recognition is performing stunningly well, surpassing human recognition rates. In this paper, we study face recognition at scale. Specifically, we have collected from Flickr a \textbf{Million} faces and evaluated state of the art face recognition algorithms on this dataset. We found that the performance of algorithms varies--while all perform great on LFW, once evaluated at scale recognition rates drop drastically for most algorithms. Interestingly, deep learning based approach by \cite{schroff2015facenet} performs much better, but still gets less robust at scale. We consider both verification and identification problems, and evaluate how pose affects recognition at scale. Moreover, we ran an extensive human study on Mechanical Turk to evaluate human recognition at scale, and report results. All the photos are creative commons photos and is released at \small{\url{}} for research and further experiments.

* Please see for code and data 

Featureless 2D-3D Pose Estimation by Minimising an Illumination-Invariant Loss

Nov 03, 2010
Srimal Jayawardena, Marcus Hutter, Nathan Brewer

The problem of identifying the 3D pose of a known object from a given 2D image has important applications in Computer Vision ranging from robotic vision to image analysis. Our proposed method of registering a 3D model of a known object on a given 2D photo of the object has numerous advantages over existing methods: It does neither require prior training nor learning, nor knowledge of the camera parameters, nor explicit point correspondences or matching features between image and model. Unlike techniques that estimate a partial 3D pose (as in an overhead view of traffic or machine parts on a conveyor belt), our method estimates the complete 3D pose of the object, and works on a single static image from a given view, and under varying and unknown lighting conditions. For this purpose we derive a novel illumination-invariant distance measure between 2D photo and projected 3D model, which is then minimised to find the best pose parameters. Results for vehicle pose detection are presented.

* Proc. 13th International Conf. on Digital Image Computing: Techniques and Applications (DICTA-2011) 37--44 
* 18 LaTeX pages, 7 figures 

An Ensemble Model for Face Liveness Detection

Jan 19, 2022
Shashank Shekhar, Avinash Patel, Mrinal Haloi, Asif Salim

In this paper, we present a passive method to detect face presentation attack a.k.a face liveness detection using an ensemble deep learning technique. Face liveness detection is one of the key steps involved in user identity verification of customers during the online onboarding/transaction processes. During identity verification, an unauthenticated user tries to bypass the verification system by several means, for example, they can capture a user photo from social media and do an imposter attack using printouts of users faces or using a digital photo from a mobile device and even create a more sophisticated attack like video replay attack. We have tried to understand the different methods of attack and created an in-house large-scale dataset covering all the kinds of attacks to train a robust deep learning model. We propose an ensemble method where multiple features of the face and background regions are learned to predict whether the user is a bonafide or an attacker.

* Accepted and presented at MLDM 2022. To be published in Lattice journal 

ReenactNet: Real-time Full Head Reenactment

May 22, 2020
Mohammad Rami Koujan, Michail Christos Doukas, Anastasios Roussos, Stefanos Zafeiriou

Video-to-video synthesis is a challenging problem aiming at learning a translation function between a sequence of semantic maps and a photo-realistic video depicting the characteristics of a driving video. We propose a head-to-head system of our own implementation capable of fully transferring the human head 3D pose, facial expressions and eye gaze from a source to a target actor, while preserving the identity of the target actor. Our system produces high-fidelity, temporally-smooth and photo-realistic synthetic videos faithfully transferring the human time-varying head attributes from the source to the target actor. Our proposed implementation: 1) works in real time ($\sim 20$ fps), 2) runs on a commodity laptop with a webcam as the only input, 3) is interactive, allowing the participant to drive a target person, e.g. a celebrity, politician, etc, instantly by varying their expressions, head pose, and eye gaze, and visualising the synthesised video concurrently.

* to be published in 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) 

On adversarial patches: real-world attack on ArcFace-100 face recognition system

Oct 15, 2019
Mikhail Pautov, Grigorii Melnikov, Edgar Kaziakhmedov, Klim Kireev, Aleksandr Petiushko

Recent works showed the vulnerability of image classifiers to adversarial attacks in the digital domain. However, the majority of attacks involve adding small perturbation to an image to fool the classifier. Unfortunately, such procedures can not be used to conduct a real-world attack, where adding an adversarial attribute to the photo is a more practical approach. In this paper, we study the problem of real-world attacks on face recognition systems. We examine security of one of the best public face recognition systems, LResNet100E-IR with ArcFace loss, and propose a simple method to attack it in the physical world. The method suggests creating an adversarial patch that can be printed, added as a face attribute and photographed; the photo of a person with such attribute is then passed to the classifier such that the classifier's recognized class changes from correct to the desired one. Proposed generating procedure allows projecting adversarial patches not only on different areas of the face, such as nose or forehead but also on some wearable accessory, such as eyeglasses.


Improving Face Anti-Spoofing by 3D Virtual Synthesis

Jan 02, 2019
Jianzhu Guo, Xiangyu Zhu, Jinchuan Xiao, Zhen Lei, Genxun Wan, Stan Z. Li

Face anti-spoofing is crucial for the security of face recognition systems. Learning based methods especially deep learning based methods need large-scale training samples to reduce overfitting. However, acquiring spoof data is very expensive since the live faces should be re-printed and re-captured in many views. In this paper, we present a method to synthesize virtual spoof data in 3D space to alleviate this problem. Specifically, we consider a printed photo as a flat surface and mesh it into a 3D object, which is then randomly bent and rotated in 3D space. Afterward, the transformed 3D photo is rendered through perspective projection as a virtual sample. The synthetic virtual samples can significantly boost the anti-spoofing performance when combined with a proposed data balancing strategy. Our promising results open up new possibilities for advancing face anti-spoofing using cheap and large-scale synthetic data.

* 9 pages, 8 figures and 9 tables 

Password-conditioned Anonymization and Deanonymization with Face Identity Transformers

Nov 29, 2019
Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee

Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications. However, there is also an increasing societal concern as the captured images/videos may contain privacy-sensitive information (e.g., face identity). We propose a novel face identity transformer which enables automated photo-realistic password-based anonymization as well as deanonymization of human faces appearing in visual data. Our face identity transformer is trained to (1) remove face identity information after anonymization, (2) make the recovery of the original face possible when given the correct password, and (3) return a wrong--but photo-realistic--face given a wrong password. Extensive experiments show that our approach enables multimodal password-conditioned face anonymizations and deanonymizations, without sacrificing privacy compared to existing anonymization approaches.