Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo style transfer": models, code, and papers

Aesthetic Features for Personalized Photo Recommendation

Aug 31, 2018
Yu Qing Zhou, Ga Wu, Scott Sanner, Putra Manggala

Many photography websites such as Flickr, 500px, Unsplash, and Adobe Behance are used by amateur and professional photography enthusiasts. Unlike content-based image search, such users of photography websites are not just looking for photos with certain content, but more generally for photos with a certain photographic "aesthetic". In this context, we explore personalized photo recommendation and propose two aesthetic feature extraction methods based on (i) color space and (ii) deep style transfer embeddings. Using a dataset from 500px, we evaluate how these features can be best leveraged by collaborative filtering methods and show that (ii) provides a significant boost in photo recommendation performance.

* In Proceedings of the Late-Breaking Results track part of the Twelfth ACM Conference on Recommender Systems, Vancouver, BC, Canada, October 6, 2018, 2 pages 
Access Paper or Ask Questions

Photorealistic Style Transfer via Wavelet Transforms

Mar 23, 2019
Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, Jung-Woo Ha

Recent style transfer models have provided promising artistic results. However, given a photograph as a reference style, existing methods are limited by spatial distortions or unrealistic artifacts, which should not happen in real photographs. We introduce a theoretically sound correction to the network architecture that remarkably enhances photorealism and faithfully transfers the style. The key ingredient of our method is wavelet transforms that naturally fits in deep networks. We propose a wavelet corrected transfer based on whitening and coloring transforms (WCT$^2$) that allows features to preserve their structural information and statistical properties of VGG feature space during stylization. This is the first and the only end-to-end model that can stylize $1024\times1024$ resolution image in 4.7 seconds, giving a pleasing and photorealistic quality without any post-processing. Last but not least, our model provides a stable video stylization without temporal constraints. The code, generated images, supplementary materials, and pre-trained models are all available at

* Code and data: 
Access Paper or Ask Questions

A Flexible Convolutional Solver with Application to Photorealistic Style Transfer

Jun 13, 2018
Gilles Puy, Patrick Pérez

We propose a new flexible deep convolutional neural network (convnet) to perform fast visual style transfer. In contrast to existing convnets that address the same task, our architecture derives directly from the structure of the gradient descent originally used to solve the style transfer problem [Gatys et al., 2016]. Like existing convnets, ours approximately solves the original problem much faster than the gradient descent. However, our network is uniquely flexible by design: it can be manipulated at runtime to enforce new constraints on the final solution. In particular, we show how to modify it to obtain a photorealistic result with no retraining. We study the modifications made by [Luan et al., 2017] to the original cost function of [Gatys et al., 2016] to achieve photorealistic style transfer. These modifications affect directly the gradient descent and can be reported on-the-fly in our network. These modifications are possible as the proposed architecture stems from unrolling the gradient descent.

Access Paper or Ask Questions

TextStyleBrush: Transfer of Text Aesthetics from a Single Example

Jun 15, 2021
Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vassilev, Tal Hassner

We present a novel approach for disentangling the content of a text image from all aspects of its appearance. The appearance representation we derive can then be applied to new content, for one-shot transfer of the source style to new content. We learn this disentanglement in a self-supervised manner. Our method processes entire word boxes, without requiring segmentation of text from background, per-character processing, or making assumptions on string lengths. We show results in different text domains which were previously handled by specialized methods, e.g., scene text, handwritten text. To these ends, we make a number of technical contributions: (1) We disentangle the style and content of a textual image into a non-parametric, fixed-dimensional vector. (2) We propose a novel approach inspired by StyleGAN but conditioned over the example style at different resolution and content. (3) We present novel self-supervised training criteria which preserve both source style and target content using a pre-trained font classifier and text recognizer. Finally, (4) we also introduce Imgur5K, a new challenging dataset for handwritten word images. We offer numerous qualitative photo-realistic results of our method. We further show that our method surpasses previous work in quantitative tests on scene text and handwriting datasets, as well as in a user study.

* 18 pages, 13 figures 
Access Paper or Ask Questions

Learning Semantic Person Image Generation by Region-Adaptive Normalization

Apr 14, 2021
Zhengyao Lv, Xiaoming Li, Xin Li, Fu Li, Tianwei Lin, Dongliang He, Wangmeng Zuo

Human pose transfer has received great attention due to its wide applications, yet is still a challenging task that is not well solved. Recent works have achieved great success to transfer the person image from the source to the target pose. However, most of them cannot well capture the semantic appearance, resulting in inconsistent and less realistic textures on the reconstructed results. To address this issue, we propose a new two-stage framework to handle the pose and appearance translation. In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer and further benefit the latter translation of per-region appearance style. In the second one, with the predicted target semantic maps, we suggest a new person image generation method by incorporating the region-adaptive normalization, in which it takes the per-region styles to guide the target appearance generation. Extensive experiments show that our proposed SPGNet can generate more semantic, consistent, and photo-realistic results and perform favorably against the state of the art methods in terms of quantitative and qualitative evaluation. The source code and model are available at

Access Paper or Ask Questions

GarmentGAN: Photo-realistic Adversarial Fashion Transfer

Mar 04, 2020
Amir Hossein Raffiee, Michael Sollami

The garment transfer problem comprises two tasks: learning to separate a person's body (pose, shape, color) from their clothing (garment type, shape, style) and then generating new images of the wearer dressed in arbitrary garments. We present GarmentGAN, a new algorithm that performs image-based garment transfer through generative adversarial methods. The GarmentGAN framework allows users to virtually try-on items before purchase and generalizes to various apparel types. GarmentGAN requires as input only two images, namely, a picture of the target fashion item and an image containing the customer. The output is a synthetic image wherein the customer is wearing the target apparel. In order to make the generated image look photo-realistic, we employ the use of novel generative adversarial techniques. GarmentGAN improves on existing methods in the realism of generated imagery and solves various problems related to self-occlusions. Our proposed model incorporates additional information during training, utilizing both segmentation maps and body key-point information. We show qualitative and quantitative comparisons to several other networks to demonstrate the effectiveness of this technique.

* 9 pages and 7 figures 
Access Paper or Ask Questions

SurReal: enhancing Surgical simulation Realism using style transfer

Nov 07, 2018
Imanol Luengo, Evangello Flouty, Petros Giataganas, Piyamate Wisanuvej, Jean Nehme, Danail Stoyanov

Surgical simulation is an increasingly important element of surgical education. Using simulation can be a means to address some of the significant challenges in developing surgical skills with limited time and resources. The photo-realistic fidelity of simulations is a key feature that can improve the experience and transfer ratio of trainees. In this paper, we demonstrate how we can enhance the visual fidelity of existing surgical simulation by performing style transfer of multi-class labels from real surgical video onto synthetic content. We demonstrate our approach on simulations of cataract surgery using real data labels from an existing public dataset. Our results highlight the feasibility of the approach and also the powerful possibility to extend this technique to incorporate additional temporal constraints and to different applications.

* BMVC 2018 
Access Paper or Ask Questions

Depth-aware Neural Style Transfer using Instance Normalization

Mar 17, 2022
Eleftherios Ioannou, Steve Maddock

Neural Style Transfer (NST) is concerned with the artistic stylization of visual media. It can be described as the process of transferring the style of an artistic image onto an ordinary photograph. Recently, a number of studies have considered the enhancement of the depth-preserving capabilities of the NST algorithms to address the undesired effects that occur when the input content images include numerous objects at various depths. Our approach uses a deep residual convolutional network with instance normalization layers that utilizes an advanced depth prediction network to integrate depth preservation as an additional loss function to content and style. We demonstrate results that are effective in retaining the depth and global structure of content images. Three different evaluation processes show that our system is capable of preserving the structure of the stylized results while exhibiting style-capture capabilities and aesthetic qualities comparable or superior to state-of-the-art methods.

* 16 pages, 7 figures, submitted to European Conference on Computer Vision (ECCV) 2022 
Access Paper or Ask Questions

Spatial Content Alignment For Pose Transfer

Mar 31, 2021
Wing-Yin Yu, Lai-Man Po, Yuzhi Zhao, Jingjing Xiong, Kin-Wai Lau

Due to unreliable geometric matching and content misalignment, most conventional pose transfer algorithms fail to generate fine-trained person images. In this paper, we propose a novel framework Spatial Content Alignment GAN (SCAGAN) which aims to enhance the content consistency of garment textures and the details of human characteristics. We first alleviate the spatial misalignment by transferring the edge content to the target pose in advance. Secondly, we introduce a new Content-Style DeBlk which can progressively synthesize photo-realistic person images based on the appearance features of the source image, the target pose heatmap and the prior transferred content in edge domain. We compare the proposed framework with several state-of-the-art methods to show its superiority in quantitative and qualitative analysis. Moreover, detailed ablation study results demonstrate the efficacy of our contributions. Codes are publicly available at

* IEEE International Conference on Multimedia and Expo (ICME) 2021 Oral 
Access Paper or Ask Questions

Artist Style Transfer Via Quadratic Potential

Mar 05, 2019
Rahul Bhalley, Jianlin Su

In this paper we address the problem of artist style transfer where the painting style of a given artist is applied on a real world photograph. We train our neural networks in adversarial setting via recently introduced quadratic potential divergence for stable learning process. To further improve the quality of generated artist stylized images we also integrate some of the recently introduced deep learning techniques in our method. To our best knowledge this is the first attempt towards artist style transfer via quadratic potential divergence. We provide some stylized image samples in the supplementary material. The source code for experimentation was written in PyTorch and is available online in my GitHub repository.

* 8 pages, 3 figures, uses nips_2018.sty, renamed the network to CycleGAN-QP for maintaining consistency with work 
Access Paper or Ask Questions