Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo": models, code, and papers

How To Extract Fashion Trends From Social Media? A Robust Object Detector With Support For Unsupervised Learning

Jun 28, 2018
Vijay Gabale, Anand Prabhu Subramanian

With the proliferation of social media, fashion inspired from celebrities, reputed designers as well as fashion influencers has shortened the cycle of fashion design and manufacturing. However, with the explosion of fashion related content and large number of user generated fashion photos, it is an arduous task for fashion designers to wade through social media photos and create a digest of trending fashion. This necessitates deep parsing of fashion photos on social media to localize and classify multiple fashion items from a given fashion photo. While object detection competitions such as MSCOCO have thousands of samples for each of the object categories, it is quite difficult to get large labeled datasets for fast fashion items. Moreover, state-of-the-art object detectors do not have any functionality to ingest large amount of unlabeled data available on social media in order to fine tune object detectors with labeled datasets. In this work, we show application of a generic object detector, that can be pretrained in an unsupervised manner, on 24 categories from recently released Open Images V4 dataset. We first train the base architecture of the object detector using unsupervisd learning on 60K unlabeled photos from 24 categories gathered from social media, and then subsequently fine tune it on 8.2K labeled photos from Open Images V4 dataset. On 300 X 300 image inputs, we achieve 72.7% mAP on a test dataset of 2.4K photos while performing 11% to 17% better as compared to the state-of-the-art object detectors. We show that this improvement is due to our choice of architecture that lets us do unsupervised learning and that performs significantly better in identifying small objects.

* 6 pages, 3 figures, AI for Fashion, KDD 2018 

A naive method to discover directions in the StyleGAN2 latent space

Mar 19, 2022
Andrea Giardina, Soumya Subhra Paria, Adhikari Kaustubh

Several research groups have shown that Generative Adversarial Networks (GANs) can generate photo-realistic images in recent years. Using the GANs, a map is created between a latent code and a photo-realistic image. This process can also be reversed: given a photo as input, it is possible to obtain the corresponding latent code. In this paper, we will show how the inversion process can be easily exploited to interpret the latent space and control the output of StyleGAN2, a GAN architecture capable of generating photo-realistic faces. From a biological perspective, facial features such as nose size depend on important genetic factors, and we explore the latent spaces that correspond to such biological features, including masculinity and eye colour. We show the results obtained by applying the proposed method to a set of photos extracted from the CelebA-HQ database. We quantify some of these measures by utilizing two landmarking protocols, and evaluate their robustness through statistical analysis. Finally we correlate these measures with the input parameters used to perturb the latent spaces along those interpretable directions. Our results contribute towards building the groundwork of using such GAN architecture in forensics to generate photo-realistic faces that satisfy certain biological attributes.


Interactive White Balancing for Camera-Rendered Images

Sep 26, 2020
Mahmoud Afifi, Michael S. Brown

White balance (WB) is one of the first photo-finishing steps used to render a captured image to its final output. WB is applied to remove the color cast caused by the scene's illumination. Interactive photo-editing software allows users to manually select different regions in a photo as examples of the illumination for WB correction (e.g., clicking on achromatic objects). Such interactive editing is possible only with images saved in a RAW image format. This is because RAW images have no photo-rendering operations applied and photo-editing software is able to apply WB and other photo-finishing procedures to render the final image. Interactively editing WB in camera-rendered images is significantly more challenging. This is because the camera hardware has already applied WB to the image and subsequent nonlinear photo-processing routines. These nonlinear rendering operations make it difficult to change the WB post-capture. The goal of this paper is to allow interactive WB manipulation of camera-rendered images. The proposed method is an extension of our recent work \cite{afifi2019color} that proposed a post-capture method for WB correction based on nonlinear color-mapping functions. Here, we introduce a new framework that links the nonlinear color-mapping functions directly to user-selected colors to enable {\it interactive} WB manipulation. This new framework is also more efficient in terms of memory and run-time (99\% reduction in memory and 3$\times$ speed-up). Lastly, we describe how our framework can leverage a simple illumination estimation method (i.e., gray-world) to perform auto-WB correction that is on a par with the WB correction results in \cite{afifi2019color}. The source code is publicly available at

* To appear in Color and Imaging Conference (CIC28), 2020 

Time-Travel Rephotography

Dec 22, 2020
Xuan Luo, Xuaner Zhang, Paul Yoo, Ricardo Martin-Brualla, Jason Lawrence, Steven M. Seitz

Many historical people are captured only in old, faded, black and white photos, that have been distorted by the limitations of early cameras and the passage of time. This paper simulates traveling back in time with a modern camera to rephotograph famous subjects. Unlike conventional image restoration filters which apply independent operations like denoising, colorization, and superresolution, we leverage the StyleGAN2 framework to project old photos into the space of modern high-resolution photos, achieving all of these effects in a unified framework. A unique challenge with this approach is capturing the identity and pose of the photo's subject and not the many artifacts in low-quality antique photos. Our comparisons to current state-of-the-art restoration filters show significant improvements and compelling results for a variety of important historical people.

* Project Page: Video: 

Neural Rerendering in the Wild

Apr 08, 2019
Moustafa Meshry, Dan B Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin-Brualla

We explore total scene capture -- recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.

* To be presented at CVPR 2019 (oral). Supplementary video available at 

A Closed-form Solution to Photorealistic Image Stylization

Jul 27, 2018
Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz

Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic. While several photorealistic image stylization methods exist, they tend to generate spatially inconsistent stylizations with noticeable artifacts. In this paper, we propose a method to address these issues. The proposed method consists of a stylization step and a smoothing step. While the stylization step transfers the style of the reference photo to the content photo, the smoothing step ensures spatially consistent stylizations. Each of the steps has a closed-form solution and can be computed efficiently. We conduct extensive experimental validations. The results show that the proposed method generates photorealistic stylization outputs that are more preferred by human subjects as compared to those by the competing methods while running much faster. Source code and additional results are available at .

* Accepted by ECCV 2018 

Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study

Jan 05, 2017
Yi-Ling Chen, Tzu-Wei Huang, Kai-Han Chang, Yu-Chen Tsai, Hwann-Tzong Chen, Bing-Yu Chen

Automatic photo cropping is an important tool for improving visual quality of digital photos without resorting to tedious manual selection. Traditionally, photo cropping is accomplished by determining the best proposal window through visual quality assessment or saliency detection. In essence, the performance of an image cropper highly depends on the ability to correctly rank a number of visually similar proposal windows. Despite the ranking nature of automatic photo cropping, little attention has been paid to learning-to-rank algorithms in tackling such a problem. In this work, we conduct an extensive study on traditional approaches as well as ranking-based croppers trained on various image features. In addition, a new dataset consisting of high quality cropping and pairwise ranking annotations is presented to evaluate the performance of various baselines. The experimental results on the new dataset provide useful insights into the design of better photo cropping algorithms.

* The dataset presented in this article can be found on Github 

State of the Art on Neural Rendering

Apr 08, 2020
Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, Rohit Pandey, Sean Fanello, Gordon Wetzstein, Jun-Yan Zhu, Christian Theobalt, Maneesh Agrawala, Eli Shechtman, Dan B Goldman, Michael Zollhöfer

Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems.

* Eurographics 2020 survey paper 

Feasibility study of urban flood mapping using traffic signs for route optimization

Sep 24, 2021
Bahareh Alizadeh, Diya Li, Zhe Zhang, Amir H. Behzadan

Water events are the most frequent and costliest climate disasters around the world. In the U.S., an estimated 127 million people who live in coastal areas are at risk of substantial home damage from hurricanes or flooding. In flood emergency management, timely and effective spatial decision-making and intelligent routing depend on flood depth information at a fine spatiotemporal scale. In this paper, crowdsourcing is utilized to collect photos of submerged stop signs, and pair each photo with a pre-flood photo taken at the same location. Each photo pair is then analyzed using deep neural network and image processing to estimate the depth of floodwater in the location of the photo. Generated point-by-point depth data is converted to a flood inundation map and used by an A* search algorithm to determine an optimal flood-free path connecting points of interest. Results provide crucial information to rescue teams and evacuees by enabling effective wayfinding during flooding events.

* URL: 

GPU-Accelerated Mobile Multi-view Style Transfer

Mar 02, 2020
Puneet Kohli, Saravana Gunaseelan, Jason Orozco, Yiwen Hua, Edward Li, Nicolas Dahlquist

An estimated 60% of smartphones sold in 2018 were equipped with multiple rear cameras, enabling a wide variety of 3D-enabled applications such as 3D Photos. The success of 3D Photo platforms (Facebook 3D Photo, Holopix, etc) depend on a steady influx of user generated content. These platforms must provide simple image manipulation tools to facilitate content creation, akin to traditional photo platforms. Artistic neural style transfer, propelled by recent advancements in GPU technology, is one such tool for enhancing traditional photos. However, naively extrapolating single-view neural style transfer to the multi-view scenario produces visually inconsistent results and is prohibitively slow on mobile devices. We present a GPU-accelerated multi-view style transfer pipeline which enforces style consistency between views with on-demand performance on mobile platforms. Our pipeline is modular and creates high quality depth and parallax effects from a stereoscopic image pair.

* 6 pages, 5 figures