Our ability to sample realistic natural images, particularly faces, has advanced by leaps and bounds in recent years, yet our ability to exert fine-tuned control over the generative process has lagged behind. If this new technology is to find practical uses, we need to achieve a level of control over generative networks which, without sacrificing realism, is on par with that seen in computer graphics and character animation. To this end we propose ConfigNet, a neural face model that allows for controlling individual aspects of output images in semantically meaningful ways and that is a significant step on the path towards finely-controllable neural rendering. ConfigNet is trained on real face images as well as synthetic face renders. Our novel method uses synthetic data to factorize the latent space into elements that correspond to the inputs of a traditional rendering pipeline, separating aspects such as head pose, facial expression, hair style, illumination, and many others which are very hard to annotate in real data. The real images, which are presented to the network without labels, extend the variety of the generated images and encourage realism. Finally, we propose an evaluation criterion using an attribute detection network combined with a user study and demonstrate state-of-the-art individual control over attributes in the output images.
We present HoloFace, an open-source framework for face alignment, head pose estimation and facial attribute retrieval for Microsoft HoloLens. HoloFace implements two state-of-the-art face alignment methods which can be used interchangeably: one running locally and one running on a remote backend. Head pose estimation is accomplished by fitting a deformable 3D model to the landmarks localized using face alignment. The head pose provides both the rotation of the head and a position in the world space. The parameters of the fitted 3D face model provide estimates of facial attributes such as mouth opening or smile. Together the above information can be used to augment the faces of people seen by the HoloLens user, and thus their interaction. Potential usage scenarios include facial recognition, emotion recognition, eye gaze tracking and many others. We demonstrate the capabilities of our framework by augmenting the faces of people seen through the HoloLens with various objects and animations.
In this paper, we propose Deep Alignment Network (DAN), a robust face alignment method based on a deep neural network architecture. DAN consists of multiple stages, where each stage improves the locations of the facial landmarks estimated by the previous stage. Our method uses entire face images at all stages, contrary to the recently proposed face alignment methods that rely on local patches. This is possible thanks to the use of landmark heatmaps which provide visual information about landmark locations estimated at the previous stages of the algorithm. The use of entire face images rather than patches allows DAN to handle face images with large variation in head pose and difficult initializations. An extensive evaluation on two publicly available datasets shows that DAN reduces the state-of-the-art failure rate by up to 70%. Our method has also been submitted for evaluation as part of the Menpo challenge.
In this work we present a face alignment pipeline based on two novel methods: weighted splitting for K-cluster Regression Forests and 3D Affine Pose Regression for face shape initialization. Our face alignment method is based on the Local Binary Feature framework, where instead of standard regression forests and pixel difference features used in the original method, we use our K-cluster Regression Forests with Weighted Splitting (KRFWS) and Pyramid HOG features. We also use KRFWS to perform Affine Pose Regression (APR) and 3D-Affine Pose Regression (3D-APR), which intend to improve the face shape initialization. APR applies a rigid 2D transform to the initial face shape that compensates for inaccuracy in the initial face location, size and in-plane rotation. 3D-APR estimates the parameters of a 3D transform that additionally compensates for out-of-plane rotation. The resulting pipeline, consisting of APR and 3D-APR followed by face alignment, shows an improvement of 20% over standard LBF on the challenging IBUG dataset, and state-of-theart accuracy on the entire 300-W dataset.