Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo style transfer": models, code, and papers

Color2Style: Real-Time Exemplar-Based Image Colorization with Self-Reference Learning and Deep Feature Modulation

Jun 16, 2021
Hengyuan Zhao, Wenhao Wu, Yihao Liu, Dongliang He

Legacy black-and-white photos are riddled with people's nostalgia and glorious memories of the past. To better relive the elapsed frozen moments, in this paper, we present a deep exemplar-based image colorization approach named Color2Style to resurrect these grayscale image media by filling them with vibrant colors. Generally, for exemplar-based colorization, unsupervised and unpaired training are usually adopted, due to the difficulty of obtaining input and ground truth image pairs. To train an exemplar-based colorization model, current algorithms usually strive to achieve two procedures: i) retrieving a large number of reference images with high similarity in advance, which is inevitably time-consuming and tedious; ii) designing complicated modules to transfer the colors of the reference image to the grayscale image, by calculating and leveraging the deep semantic correspondence between them (e.g., non-local operation). Contrary to the previous methods, we solve and simplify the above two steps in one end-to-end learning procedure. First, we adopt a self-augmented self-reference training scheme, where the reference image is generated by graphical transformations from the original colorful one whereby the training can be formulated in a paired manner. Second, instead of computing complex and inexplicable correspondence maps, our method exploits a simple yet effective deep feature modulation (DFM) module, which injects the color embeddings extracted from the reference image into the deep representations of the input grayscale image. Such design is much more lightweight and intelligible, achieving appealing performance with real-time processing speed. Moreover, our model does not require multifarious loss functions and regularization terms like existing methods, but only two widely used loss functions. Codes and models will be available at

* 16 pages, 21 figures 
Access Paper or Ask Questions

DGL-GAN: Discriminator Guided Learning for GAN Compression

Dec 13, 2021
Yuesong Tian, Li Shen, Dacheng Tao, Zhifeng Li, Wei Liu

Generative Adversarial Networks (GANs) with high computation costs, e.g., BigGAN and StyleGAN2, have achieved remarkable results in synthesizing high resolution and diverse images with high fidelity from random noises. Reducing the computation cost of GANs while keeping generating photo-realistic images is an urgent and challenging field for their broad applications on computational resource-limited devices. In this work, we propose a novel yet simple {\bf D}iscriminator {\bf G}uided {\bf L}earning approach for compressing vanilla {\bf GAN}, dubbed {\bf DGL-GAN}. Motivated by the phenomenon that the teacher discriminator may contain some meaningful information, we transfer the knowledge merely from the teacher discriminator via the adversarial function. We show DGL-GAN is valid since empirically, learning from the teacher discriminator could facilitate the performance of student GANs, verified by extensive experimental findings. Furthermore, we propose a two-stage training strategy for training DGL-GAN, which can largely stabilize its training process and achieve superior performance when we apply DGL-GAN to compress the two most representative large-scale vanilla GANs, i.e., StyleGAN2 and BigGAN. Experiments show that DGL-GAN achieves state-of-the-art (SOTA) results on both StyleGAN2 (FID 2.92 on FFHQ with nearly $1/3$ parameters of StyleGAN2) and BigGAN (IS 93.29 and FID 9.92 on ImageNet with nearly $1/4$ parameters of BigGAN) and also outperforms several existing vanilla GAN compression techniques. Moreover, DGL-GAN is also effective in boosting the performance of original uncompressed GANs, original uncompressed StyleGAN2 boosted with DGL-GAN achieves FID 2.65 on FFHQ, which achieves a new state-of-the-art performance. Code and models are available at \url{}.

Access Paper or Ask Questions

Unpaired High-Resolution and Scalable Style Transfer Using Generative Adversarial Networks

Oct 10, 2018
Andrej Junginger, Markus Hanselmann, Thilo Strauss, Sebastian Boblest, Jens Buchner, Holger Ulmer

Neural networks have proven their capabilities by outperforming many other approaches on regression or classification tasks on various kinds of data. Other astonishing results have been achieved using neural nets as data generators, especially in settings of generative adversarial networks (GANs). One special application is the field of image domain translations. Here, the goal is to take an image with a certain style (e.g. a photography) and transform it into another one (e.g. a painting). If such a task is performed for unpaired training examples, the corresponding GAN setting is complex, the neural networks are large, and this leads to a high peak memory consumption during, both, training and evaluation phase. This sets a limit to the highest processable image size. We address this issue by the idea of not processing the whole image at once, but to train and evaluate the domain translation on the level of overlapping image subsamples. This new approach not only enables us to translate high-resolution images that otherwise cannot be processed by the neural network at once, but also allows us to work with comparably small neural networks and with limited hardware resources. Additionally, the number of images required for the training process is significantly reduced. We present high-quality results on images with a total resolution of up to over 50 megapixels and emonstrate that our method helps to preserve local image details while it also keeps global consistency.

* 10 pages, 8 figures 
Access Paper or Ask Questions

MixSyn: Learning Composition and Style for Multi-Source Image Synthesis

Nov 24, 2021
Ilke Demir, Umur A. Ciftci

Synthetic images created by generative models increase in quality and expressiveness as newer models utilize larger datasets and novel architectures. Although this photorealism is a positive side-effect from a creative standpoint, it becomes problematic when such generative models are used for impersonation without consent. Most of these approaches are built on the partial transfer between source and target pairs, or they generate completely new samples based on an ideal distribution, still resembling the closest real sample in the dataset. We propose MixSyn (read as " mixin' ") for learning novel fuzzy compositions from multiple sources and creating novel images as a mix of image regions corresponding to the compositions. MixSyn not only combines uncorrelated regions from multiple source masks into a coherent semantic composition, but also generates mask-aware high quality reconstructions of non-existing images. We compare MixSyn to state-of-the-art single-source sequential generation and collage generation approaches in terms of quality, diversity, realism, and expressive power; while also showcasing interactive synthesis, mix & match, and edit propagation tasks, with no mask dependency.

Access Paper or Ask Questions

One-Shot Face Reenactment on Megapixels

May 26, 2022
Wonjun Kang, Geonsu Lee, Hyung Il Koo, Nam Ik Cho

The goal of face reenactment is to transfer a target expression and head pose to a source face while preserving the source identity. With the popularity of face-related applications, there has been much research on this topic. However, the results of existing methods are still limited to low-resolution and lack photorealism. In this work, we present a one-shot and high-resolution face reenactment method called MegaFR. To be precise, we leverage StyleGAN by using 3DMM-based rendering images and overcome the lack of high-quality video datasets by designing a loss function that works without high-quality videos. Also, we apply iterative refinement to deal with extreme poses and/or expressions. Since the proposed method controls source images through 3DMM parameters, we can explicitly manipulate source images. We apply MegaFR to various applications such as face frontalization, eye in-painting, and talking head generation. Experimental results show that our method successfully disentangles identity from expression and head pose, and outperforms conventional methods.

* 29 pages, 19 figures 
Access Paper or Ask Questions

3D GAN Inversion for Controllable Portrait Image Animation

Mar 25, 2022
Connor Z. Lin, David B. Lindell, Eric R. Chan, Gordon Wetzstein

Millions of images of human faces are captured every single day; but these photographs portray the likeness of an individual with a fixed pose, expression, and appearance. Portrait image animation enables the post-capture adjustment of these attributes from a single image while maintaining a photorealistic reconstruction of the subject's likeness or identity. Still, current methods for portrait image animation are typically based on 2D warping operations or manipulations of a 2D generative adversarial network (GAN) and lack explicit mechanisms to enforce multi-view consistency. Thus these methods may significantly alter the identity of the subject, especially when the viewpoint relative to the camera is changed. In this work, we leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency. We propose a supervision strategy to flexibly manipulate expressions with 3D morphable models, and we show that the proposed method also supports editing appearance attributes, such as age or hairstyle, by interpolating within the latent space of the GAN. The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer while also supporting attribute editing.

* Project page: 
Access Paper or Ask Questions

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Apr 12, 2021
Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. Concretely, we learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images supplemented with only few labeled ones. We build our architecture on top of StyleGAN2, augmented with a label synthesis branch. Image labeling at test time is achieved by first embedding the target image into the joint latent space via an encoder network and test-time optimization, and then generating the label from the inferred embedding. We evaluate our approach in two important domains: medical image segmentation and part-based face segmentation. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization, such as transferring from CT to MRI in medical imaging, and photographs of real faces to paintings, sculptures, and even cartoons and animal faces. Project Page: \url{}

* CVPR2021 
Access Paper or Ask Questions