Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

Neural Photo Editing with Introspective Adversarial Networks

Feb 06, 2017
Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston

The increasingly photorealistic sample quality of generative image models suggests their feasibility in applications beyond image generation. We present the Neural Photo Editor, an interface that leverages the power of generative neural networks to make large, semantically coherent changes to existing images. To tackle the challenge of achieving accurate reconstructions without loss of feature quality, we introduce the Introspective Adversarial Network, a novel hybridization of the VAE and GAN. Our model efficiently captures long-range dependencies through use of a computational block based on weight-shared dilated convolutions, and improves generalization performance with Orthogonal Regularization, a novel weight regularization method. We validate our contributions on CelebA, SVHN, and CIFAR-100, and produce samples and reconstructions with high visual fidelity.

* 10 pages, 7 figures, 3 tables 
Access Paper or Ask Questions

Unsupervised routine discovery in egocentric photo-streams

May 10, 2019
Estefania Talavera, Nicolai Petkov, Petia Radeva

The routine of a person is defined by the occurrence of activities throughout different days, and can directly affect the person's health. In this work, we address the recognition of routine related days. To do so, we rely on egocentric images, which are recorded by a wearable camera and allow to monitor the life of the user from a first-person view perspective. We propose an unsupervised model that identifies routine related days, following an outlier detection approach. We test the proposed framework over a total of 72 days in the form of photo-streams covering around 2 weeks of the life of 5 different camera wearers. Our model achieves an average of 76% Accuracy and 68% Weighted F-Score for all the users. Thus, we show that our framework is able to recognise routine related days and opens the door to the understanding of the behaviour of people.

Access Paper or Ask Questions

Semantic Summarization of Egocentric Photo Stream Events

Aug 18, 2017
Aniol Lidon, Marc Bolaños, Mariella Dimiccoli, Petia Radeva, Maite Garolera, Xavier Giró-i-Nieto

With the rapid increase of users of wearable cameras in recent years and of the amount of data they produce, there is a strong need for automatic retrieval and summarization techniques. This work addresses the problem of automatically summarizing egocentric photo streams captured through a wearable camera by taking an image retrieval perspective. After removing non-informative images by a new CNN-based filter, images are ranked by relevance to ensure semantic diversity and finally re-ranked by a novelty criterion to reduce redundancy. To assess the results, a new evaluation metric is proposed which takes into account the non-uniqueness of the solution. Experimental results applied on a database of 7,110 images from 6 different subjects and evaluated by experts gave 95.74% of experts satisfaction and a Mean Opinion Score of 4.57 out of 5.0. Source code is available at

* Oral paper at the ACM Multimedia 2017 Workshop on Lifelogging Tools and Applications (LTA), Mountain View, California USA. 
Access Paper or Ask Questions

Eating Habits Discovery in Egocentric Photo-streams

Sep 16, 2020
Estefania Talavera, Andreea Glavan, Alina Matei, Petia Radeva

Eating habits are learned throughout the early stages of our lives. However, it is not easy to be aware of how our food-related routine affects our healthy living. In this work, we address the unsupervised discovery of nutritional habits from egocentric photo-streams. We build a food-related behavioural pattern discovery model, which discloses nutritional routines from the activities performed throughout the days. To do so, we rely on Dynamic-Time-Warping for the evaluation of similarity among the collected days. Within this framework, we present a simple, but robust and fast novel classification pipeline that outperforms the state-of-the-art on food-related image classification with a weighted accuracy and F-score of 70% and 63%, respectively. Later, we identify days composed of nutritional activities that do not describe the habits of the person as anomalies in the daily life of the user with the Isolation Forest method. Furthermore, we show an application for the identification of food-related scenes when the camera wearer eats in isolation. Results have shown the good performance of the proposed model and its relevance to visualize the nutritional habits of individuals.

Access Paper or Ask Questions

Recognition of Multiple Food Items in a Single Photo for Use in a Buffet-Style Restaurant

Mar 03, 2019
Masashi Anzawa, Sosuke Amano, Yoko Yamakata, Keiko Motonaga, Akiko Kamei, Kiyoharu Aizawa

We investigate image recognition of multiple food items in a single photo, focusing on a buffet restaurant application, where menu changes at every meal, and only a few images per class are available. After detecting food areas, we perform hierarchical recognition. We evaluate our results, comparing to two baseline methods.

* IEICE TRANSACTIONS on Information and Systems, 2019 
* 5 pages, 7 figures 
Access Paper or Ask Questions

Studio2Shop: from studio photo shoots to fashion articles

Jul 02, 2018
Julia Lasserre, Katharina Rasch, Roland Vollgraf

Fashion is an increasingly important topic in computer vision, in particular the so-called street-to-shop task of matching street images with shop images containing similar fashion items. Solving this problem promises new means of making fashion searchable and helping shoppers find the articles they are looking for. This paper focuses on finding pieces of clothing worn by a person in full-body or half-body images with neutral backgrounds. Such images are ubiquitous on the web and in fashion blogs, and are typically studio photos, we refer to this setting as studio-to-shop. Recent advances in computational fashion include the development of domain-specific numerical representations. Our model Studio2Shop builds on top of such representations and uses a deep convolutional network trained to match a query image to the numerical feature vectors of all the articles annotated in this image. Top-$k$ retrieval evaluation on test query images shows that the correct items are most often found within a range that is sufficiently small for building realistic visual search engines for the studio-to-shop setting.

* Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (January 16-18, 2018, in Funchal, Madeira, Portugal), Vol. 1 (ISBN 978-989-758-276-9), P. 37-48 
* 12 pages, 9 figures (Figure 1 has 5 subfigures, Figure 2 has 3 subfigures), 7 tables 
Access Paper or Ask Questions

GarmentGAN: Photo-realistic Adversarial Fashion Transfer

Mar 04, 2020
Amir Hossein Raffiee, Michael Sollami

The garment transfer problem comprises two tasks: learning to separate a person's body (pose, shape, color) from their clothing (garment type, shape, style) and then generating new images of the wearer dressed in arbitrary garments. We present GarmentGAN, a new algorithm that performs image-based garment transfer through generative adversarial methods. The GarmentGAN framework allows users to virtually try-on items before purchase and generalizes to various apparel types. GarmentGAN requires as input only two images, namely, a picture of the target fashion item and an image containing the customer. The output is a synthetic image wherein the customer is wearing the target apparel. In order to make the generated image look photo-realistic, we employ the use of novel generative adversarial techniques. GarmentGAN improves on existing methods in the realism of generated imagery and solves various problems related to self-occlusions. Our proposed model incorporates additional information during training, utilizing both segmentation maps and body key-point information. We show qualitative and quantitative comparisons to several other networks to demonstrate the effectiveness of this technique.

* 9 pages and 7 figures 
Access Paper or Ask Questions

How useful is photo-realistic rendering for visual learning?

Sep 08, 2016
Yair Movshovitz-Attias, Takeo Kanade, Yaser Sheikh

Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive. With the advent of rich 3D repositories, photo-realistic rendering systems offer the opportunity to provide nearly limitless data. Yet, their primary value for visual learning may be the quality of the data they can provide rather than the quantity. Rendering engines offer the promise of perfect labels in addition to the data: what the precise camera pose is; what the precise lighting location, temperature, and distribution is; what the geometry of the object is. In this work we focus on semi-automating dataset creation through use of synthetic data and apply this method to an important task -- object viewpoint estimation. Using state-of-the-art rendering software we generate a large labeled dataset of cars rendered densely in viewpoint space. We investigate the effect of rendering parameters on estimation performance and show realism is important. We show that generalizing from synthetic data is not harder than the domain adaptation required between two real-image datasets and that combining synthetic images with a small amount of real data improves estimation accuracy.

* Published in GMDL 2016 In conjunction with ECCV 2016 
Access Paper or Ask Questions

Photo-to-Shape Material Transfer for Diverse Structures

May 09, 2022
Ruizhen Hu, Xiangyu Su, Xiangkai Chen, Oliver Van Kaick, Hui Huang

We introduce a method for assigning photorealistic relightable materials to 3D shapes in an automatic manner. Our method takes as input a photo exemplar of a real object and a 3D object with segmentation, and uses the exemplar to guide the assignment of materials to the parts of the shape, so that the appearance of the resulting shape is as similar as possible to the exemplar. To accomplish this goal, our method combines an image translation neural network with a material assignment neural network. The image translation network translates the color from the exemplar to a projection of the 3D shape and the part segmentation from the projection to the exemplar. Then, the material prediction network assigns materials from a collection of realistic materials to the projected parts, based on the translated images and perceptual similarity of the materials. One key idea of our method is to use the translation network to establish a correspondence between the exemplar and shape projection, which allows us to transfer materials between objects with diverse structures. Another key idea of our method is to use the two pairs of (color, segmentation) images provided by the image translation to guide the material assignment, which enables us to ensure the consistency in the assignment. We demonstrate that our method allows us to assign materials to shapes so that their appearances better resemble the input exemplars, improving the quality of the results over the state-of-the-art method, and allowing us to automatically create thousands of shapes with high-quality photorealistic materials. Code and data for this paper are available at

Access Paper or Ask Questions

Quality Guided Sketch-to-Photo Image Synthesis

Apr 20, 2020
Uche Osahor, Hadi Kazemi, Ali Dabouei, Nasser Nasrabadi

Facial sketches drawn by artists are widely used for visual identification applications and mostly by law enforcement agencies, but the quality of these sketches depend on the ability of the artist to clearly replicate all the key facial features that could aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. In this work, we propose a novel approach that adopts a generative adversarial network that synthesizes a single sketch into multiple synthetic images with unique attributes like hair color, sex, etc. We incorporate a hybrid discriminator which performs attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network and an identity preserving network that maintains the identity of the synthesised image throughout the training process. Our approach is aimed at improving the visual appeal of the synthesised images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesised image. We synthesised sketches using XDOG filter for the CelebA, WVU Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results are impressive compared to current state of the art.

* 10 pages, 9 figures 
Access Paper or Ask Questions