Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

A database for face presentation attack using wax figure faces

Jun 06, 2019
Shan Jia, Chuanbo Hu, Guodong Guo, Zhengquan Xu

Compared to 2D face presentation attacks (e.g. printed photos and video replays), 3D type attacks are more challenging to face recognition systems (FRS) by presenting 3D characteristics or materials similar to real faces. Existing 3D face spoofing databases, however, mostly based on 3D masks, are restricted to small data size or poor authenticity due to the production difficulty and high cost. In this work, we introduce the first wax figure face database, WFFD, as one type of super-realistic 3D presentation attacks to spoof the FRS. This database consists of 2200 images with both real and wax figure faces (totally 4400 faces) with a high diversity from online collections. Experiments on this database first investigate the vulnerability of three popular FRS to this kind of new attack. Further, we evaluate the performance of several face presentation attack detection methods to show the attack abilities of this super-realistic face spoofing database.

  
Access Paper or Ask Questions

Hallucinating very low-resolution and obscured face images

Dec 12, 2018
Lianping Yang, Bin Shao, Ting Sun, Song Ding, Xiangde Zhang

Most of the face hallucination methods are designed for complete inputs. They will not work well if the inputs are very tiny or contaminated by large occlusion. Inspired by this fact, we propose an obscured face hallucination network(OFHNet). The OFHNet consists of four parts: an inpainting network, an upsampling network, a discriminative network, and a fixed facial landmark detection network. The inpainting network restores the low-resolution(LR) obscured face images. The following upsampling network is to upsample the output of inpainting network. In order to ensure the generated high-resolution(HR) face images more photo-realistic, we utilize the discriminative network and the facial landmark detection network to better the result of upsampling network. In addition, we present a semantic structure loss, which makes the generated HR face images more pleasing. Extensive experiments show that our framework can restore the appealing HR face images from 1/4 missing area LR face images with a challenging scaling factor of 8x.

* 20 pages, Submitted to Pattern Recognition Letters 
  
Access Paper or Ask Questions

Progressive Structure from Motion

Jul 10, 2018
Alex Locher, Michal Havlena, Luc Van Gool

Structure from Motion or the sparse 3D reconstruction out of individual photos is a long studied topic in computer vision. Yet none of the existing reconstruction pipelines fully addresses a progressive scenario where images are only getting available during the reconstruction process and intermediate results are delivered to the user. Incremental pipelines are capable of growing a 3D model but often get stuck in local minima due to wrong (binding) decisions taken based on incomplete information. Global pipelines on the other hand need the access to the complete viewgraph and are not capable of delivering intermediate results. In this paper we propose a new reconstruction pipeline working in a progressive manner rather than in a batch processing scheme. The pipeline is able to recover from failed reconstructions in early stages, avoids to take binding decisions, delivers a progressive output and yet maintains the capabilities of existing pipelines. We demonstrate and evaluate our method on diverse challenging public and dedicated datasets including those with highly symmetric structures and compare to the state of the art.

* Accepted to ECCV 2018 
  
Access Paper or Ask Questions

Free LSD: Prior-Free Visual Landing Site Detection for Autonomous Planes

Feb 25, 2018
Timo Hinzmann, Thomas Stastny, Cesar Cadena, Roland Siegwart, Igor Gilitschenski

Full autonomy for fixed-wing unmanned aerial vehicles (UAVs) requires the capability to autonomously detect potential landing sites in unknown and unstructured terrain, allowing for self-governed mission completion or handling of emergency situations. In this work, we propose a perception system addressing this challenge by detecting landing sites based on their texture and geometric shape without using any prior knowledge about the environment. The proposed method considers hazards within the landing region such as terrain roughness and slope, surrounding obstacles that obscure the landing approach path, and the local wind field that is estimated by the on-board EKF. The latter enables applicability of the proposed method on small-scale autonomous planes without landing gear. A safe approach path is computed based on the UAV dynamics, expected state estimation and actuator uncertainty, and the on-board computed elevation map. The proposed framework has been successfully tested on photo-realistic synthetic datasets and in challenging real-world environments.

* Accepted for publication in IEEE International Conference on Robotics and Automation (ICRA), 2018, Brisbane and IEEE Robotics and Automation Letters (RA-L), 2018 
  
Access Paper or Ask Questions

WAYLA - Generating Images from Eye Movements

Nov 21, 2017
Bingqing Yu, James J. Clark

We present a method for reconstructing images viewed by observers based only on their eye movements. By exploring the relationships between gaze patterns and image stimuli, the "What Are You Looking At?" (WAYLA) system learns to synthesize photo-realistic images that are similar to the original pictures being viewed. The WAYLA approach is based on the Conditional Generative Adversarial Network (Conditional GAN) image-to-image translation technique of Isola et al. We consider two specific applications - the first, of reconstructing newspaper images from gaze heat maps, and the second, of detailed reconstruction of images containing only text. The newspaper image reconstruction process is divided into two image-to-image translation operations, the first mapping gaze heat maps into image segmentations, and the second mapping the generated segmentation into a newspaper image. We validate the performance of our approach using various evaluation metrics, along with human visual inspection. All results confirm the ability of our network to perform image generation tasks using eye tracking data.

  
Access Paper or Ask Questions

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

Apr 12, 2016
Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention. We refer to these noisy "human-centric" annotations as exhibiting human reporting bias. Examples of such annotations include image tags and keywords found on photo sharing sites, or in datasets containing image captions. In this paper, we use these noisy annotations for learning visually correct image classifiers. Such annotations do not use consistent vocabulary, and miss a significant amount of the information present in an image; however, we demonstrate that the noise in these annotations exhibits structure and can be modeled. We propose an algorithm to decouple the human reporting bias from the correct visually grounded labels. Our results are highly interpretable for reporting "what's in the image" versus "what's worth saying." We demonstrate the algorithm's efficacy along a variety of metrics and datasets, including MS COCO and Yahoo Flickr 100M. We show significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.

* To appear in CVPR 2016 
  
Access Paper or Ask Questions

Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge

Feb 22, 2011
Arvind Narayanan, Elaine Shi, Benjamin I. P. Rubinstein

This paper describes the winning entry to the IJCNN 2011 Social Network Challenge run by Kaggle.com. The goal of the contest was to promote research on real-world link prediction, and the dataset was a graph obtained by crawling the popular Flickr social photo sharing website, with user identities scrubbed. By de-anonymizing much of the competition test set using our own Flickr crawl, we were able to effectively game the competition. Our attack represents a new application of de-anonymization to gaming machine learning contests, suggesting changes in how future competitions should be run. We introduce a new simulated annealing-based weighted graph matching algorithm for the seeding step of de-anonymization. We also show how to combine de-anonymization with link prediction---the latter is required to achieve good performance on the portion of the test set not de-anonymized---for example by training the predictor on the de-anonymized portion of the test set, and combining probabilistic predictions from de-anonymization and link prediction.

* 11 pages, 13 figures; submitted to IJCNN'2011 
  
Access Paper or Ask Questions

Mesoscopic modeling of hidden spiking neurons

May 26, 2022
Shuqi Wang, Valentin Schmutz, Guillaume Bellec, Wulfram Gerstner

Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking photo-stimulation.

* 22 pages, 7 figures 
  
Access Paper or Ask Questions

Intrinsic Image Transfer for Illumination Manipulation

Jul 01, 2021
Junqing Huang, Michael Ruzhansky, Qianying Zhang, Haihui Wang

This paper presents a novel intrinsic image transfer (IIT) algorithm for illumination manipulation, which creates a local image translation between two illumination surfaces. This model is built on an optimization-based framework consisting of three photo-realistic losses defined on the sub-layers factorized by an intrinsic image decomposition. We illustrate that all losses can be reduced without the necessity of taking an intrinsic image decomposition under the well-known spatial-varying illumination illumination-invariant reflectance prior knowledge. Moreover, with a series of relaxations, all of them can be directly defined on images, giving a closed-form solution for image illumination manipulation. This new paradigm differs from the prevailing Retinex-based algorithms, as it provides an implicit way to deal with the per-pixel image illumination. We finally demonstrate its versatility and benefits to the illumination-related tasks such as illumination compensation, image enhancement, and high dynamic range (HDR) image compression, and show the high-quality results on natural image datasets.

  
Access Paper or Ask Questions

Forensic Analysis of Video Files Using Metadata

May 13, 2021
Ziyue Xiang, János Horváth, Sriram Baireddy, Paolo Bestagini, Stefano Tubaro, Edward J. Delp

The unprecedented ease and ability to manipulate video content has led to a rapid spread of manipulated media. The availability of video editing tools greatly increased in recent years, allowing one to easily generate photo-realistic alterations. Such manipulations can leave traces in the metadata embedded in video files. This metadata information can be used to determine video manipulations, brand of video recording device, the type of video editing tool, and other important evidence. In this paper, we focus on the metadata contained in the popular MP4 video wrapper/container. We describe our method for metadata extractor that uses the MP4's tree structure. Our approach for analyzing the video metadata produces a more compact representation. We will describe how we construct features from the metadata and then use dimensionality reduction and nearest neighbor classification for forensic analysis of a video file. Our approach allows one to visually inspect the distribution of metadata features and make decisions. The experimental results confirm that the performance of our approach surpasses other methods.

  
Access Paper or Ask Questions
<<
>>