Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Smartphone Camera De-identification while Preserving Biometric Utility

Sep 17, 2020
Sudipta Banerjee, Arun Ross

The principle of Photo Response Non Uniformity (PRNU) is often exploited to deduce the identity of the smartphone device whose camera or sensor was used to acquire a certain image. In this work, we design an algorithm that perturbs a face image acquired using a smartphone camera such that (a) sensor-specific details pertaining to the smartphone camera are suppressed (sensor anonymization); (b) the sensor pattern of a different device is incorporated (sensor spoofing); and (c) biometric matching using the perturbed image is not affected (biometric utility). We employ a simple approach utilizing Discrete Cosine Transform to achieve the aforementioned objectives. Experiments conducted on the MICHE-I and OULU-NPU datasets, which contain periocular and facial data acquired using 12 smartphone cameras, demonstrate the efficacy of the proposed de-identification algorithm on three different PRNU-based sensor identification schemes. This work has application in sensor forensics and personal privacy.

* Proc. of 10th IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS), (Tampa, USA), September 2019 
Access Paper or Ask Questions

Mapping Low-Resolution Images To Multiple High-Resolution Images Using Non-Adversarial Mapping

Jun 30, 2020
Vasileios Lioutas

Several methods have recently been proposed for the Single Image Super-Resolution (SISR) problem. The current methods assume that a single low-resolution image can only yield a single high-resolution image. In addition, all of these methods use low-resolution images that were artificially generated through simple bilinear down-sampling. We argue that, first and foremost, the problem of SISR is an one-to-many mapping problem between the low-resolution and all possible candidate high-resolution images and we address the challenging task of learning how to realistically degrade and down-sample high-resolution images. To circumvent this problem, we propose SR-NAM which utilizes the Non-Adversarial Mapping (NAM) technique. Furthermore, we propose a degradation model that learns how to transform high-resolution images to low-resolution images that resemble realistically taken low-resolution photos. Finally, some qualitative results for the proposed method along with the weaknesses of SR-NAM are included.

* Paper completed in April 2019 
Access Paper or Ask Questions

Deep Exposure Fusion with Deghosting via Homography Estimation and Attention Learning

Apr 20, 2020
Sheng-Yeh Chen, Yung-Yu Chuang

Modern cameras have limited dynamic ranges and often produce images with saturated or dark regions using a single exposure. Although the problem could be addressed by taking multiple images with different exposures, exposure fusion methods need to deal with ghosting artifacts and detail loss caused by camera motion or moving objects. This paper proposes a deep network for exposure fusion. For reducing the potential ghosting problem, our network only takes two images, an underexposed image and an overexposed one. Our network integrates together homography estimation for compensating camera motion, attention mechanism for correcting remaining misalignment and moving pixels, and adversarial learning for alleviating other remaining artifacts. Experiments on real-world photos taken using handheld mobile phones show that the proposed method can generate high-quality images with faithful detail and vivid color rendition in both dark and bright areas.

* ICASSP 2020 
Access Paper or Ask Questions

Image Manipulation with Natural Language using Two-sidedAttentive Conditional Generative Adversarial Network

Dec 16, 2019
Dawei Zhu, Aditya Mogadala, Dietrich Klakow

Altering the content of an image with photo editing tools is a tedious task for an inexperienced user. Especially, when modifying the visual attributes of a specific object in an image without affecting other constituents such as background etc. To simplify the process of image manipulation and to provide more control to users, it is better to utilize a simpler interface like natural language. Therefore, in this paper, we address the challenge of manipulating images using natural language description. We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images while preserving other contents such as background intact. TEA-cGAN uses fine-grained attention both in the generator and discriminator of Generative Adversarial Network (GAN) based framework at different scales. Experimental results show that TEA-cGAN which generates 128x128 and 256x256 resolution images outperforms existing methods on CUB and Oxford-102 datasets both quantitatively and qualitatively.

* Submitted to Journal 
Access Paper or Ask Questions

CG-GAN: An Interactive Evolutionary GAN-based Approach for Facial Composite Generation)

Nov 28, 2019
Nicola Zaltron, Luisa Zurlo, Sebastian Risi

Facial composites are graphical representations of an eyewitness's memory of a face. Many digital systems are available for the creation of such composites but are either unable to reproduce features unless previously designed or do not allow holistic changes to the image. In this paper, we improve the efficiency of composite creation by removing the reliance on expert knowledge and letting the system learn to represent faces from examples. The novel approach, Composite Generating GAN (CG-GAN), applies generative and evolutionary computation to allow casual users to easily create facial composites. Specifically, CG-GAN utilizes the generator network of a pg-GAN to create high-resolution human faces. Users are provided with several functions to interactively breed and edit faces. CG-GAN offers a novel way of generating and handling static and animated photo-realistic facial composites, with the possibility of combining multiple representations of the same perpetrator, generated by different eyewitnesses.

Access Paper or Ask Questions

Text-to-Image Synthesis Based on Machine Generated Captions

Oct 09, 2019
Marco Menardi, Alex Falcon, Saida S. Mohamed, Lorenzo Seidenari, Giuseppe Serra, Alberto Del Bimbo, Carlo Tasso

Text to Image Synthesis refers to the process of automatic generation of a photo-realistic image starting from a given text and is revolutionizing many real-world applications. In order to perform such process it is necessary to exploit datasets containing captioned images, meaning that each image is associated with one (or more) captions describing it. Despite the abundance of uncaptioned images datasets, the number of captioned datasets is limited. To address this issue, in this paper we propose an approach capable of generating images starting from a given text using conditional GANs trained on uncaptioned images dataset. In particular, uncaptioned images are fed to an Image Captioning Module to generate the descriptions. Then, the GAN Module is trained on both the input image and the machine-generated caption. To evaluate the results, the performance of our solution is compared with the results obtained by the unconditional GAN. For the experiments, we chose to use the uncaptioned dataset LSUN bedroom. The results obtained in our study are preliminary but still promising.

Access Paper or Ask Questions

One-to-one Mapping for Unpaired Image-to-image Translation

Sep 16, 2019
Zengming Shen, S. Kevin Zhou, Yifan Chen, Bogdan Georgescu, Xuqi Liu, Thomas S. Huang

Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the mapping between two image domains is unique or one-to-one. Here we propose a self-inverse network learning approach for unpaired image-to-image translation. Building on top of CycleGAN, we learn a self-inverse function by simply augmenting the training samples by switching inputs and outputs during training. The outcome of such learning is a proven one-to-one mapping function. Our extensive experiments on a variety of detests, including cross-modal medical image synthesis, object transfiguration, and semantic labeling, consistently demonstrate clear improvement over the CycleGAN method both qualitatively and quantitatively. Especially our proposed method reaches the state-of-the-art result on the label to photo direction of the cityscapes benchmark dataset.

* 8 pages, 6 figures 
Access Paper or Ask Questions

Content and Colour Distillation for Learning Image Translations with the Spatial Profile Loss

Aug 01, 2019
M. Saquib Sarfraz, Constantin Seibold, Haroon Khalid, Rainer Stiefelhagen

Generative adversarial networks has emerged as a defacto standard for image translation problems. To successfully drive such models, one has to rely on additional networks e.g., discriminators and/or perceptual networks. Training these networks with pixel based losses alone are generally not sufficient to learn the target distribution. In this paper, we propose a novel method of computing the loss directly between the source and target images that enable proper distillation of shape/content and colour/style. We show that this is useful in typical image-to-image translations allowing us to successfully drive the generator without relying on additional networks. We demonstrate this on many difficult image translation problems such as image-to-image domain mapping, single image super-resolution and photo realistic makeup transfer. Our extensive evaluation shows the effectiveness of the proposed formulation and its ability to synthesize realistic images. [Code release:]

* BMVC 2019 
Access Paper or Ask Questions

Monocular Plan View Networks for Autonomous Driving

May 16, 2019
Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl, Trevor Darrell

Convolutions on monocular dash cam videos capture spatial invariances in the image plane but do not explicitly reason about distances and depth. We propose a simple transformation of observations into a bird's eye view, also known as plan view, for end-to-end control. We detect vehicles and pedestrians in the first person view and project them into an overhead plan view. This representation provides an abstraction of the environment from which a deep network can easily deduce the positions and directions of entities. Additionally, the plan view enables us to leverage advances in 3D object detection in conjunction with deep policy learning. We evaluate our monocular plan view network on the photo-realistic Grand Theft Auto V simulator. A network using both a plan view and front view causes less than half as many collisions as previous detection-based methods and an order of magnitude fewer collisions than pure pixel-based policies.

* 8 pages, 9 figures 
Access Paper or Ask Questions