Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"photo style transfer": models, code, and papers

MixSyn: Learning Composition and Style for Multi-Source Image Synthesis

Nov 24, 2021
Ilke Demir, Umur A. Ciftci

Synthetic images created by generative models increase in quality and expressiveness as newer models utilize larger datasets and novel architectures. Although this photorealism is a positive side-effect from a creative standpoint, it becomes problematic when such generative models are used for impersonation without consent. Most of these approaches are built on the partial transfer between source and target pairs, or they generate completely new samples based on an ideal distribution, still resembling the closest real sample in the dataset. We propose MixSyn (read as " mixin' ") for learning novel fuzzy compositions from multiple sources and creating novel images as a mix of image regions corresponding to the compositions. MixSyn not only combines uncorrelated regions from multiple source masks into a coherent semantic composition, but also generates mask-aware high quality reconstructions of non-existing images. We compare MixSyn to state-of-the-art single-source sequential generation and collage generation approaches in terms of quality, diversity, realism, and expressive power; while also showcasing interactive synthesis, mix & match, and edit propagation tasks, with no mask dependency.


One-Shot Face Reenactment on Megapixels

May 26, 2022
Wonjun Kang, Geonsu Lee, Hyung Il Koo, Nam Ik Cho

The goal of face reenactment is to transfer a target expression and head pose to a source face while preserving the source identity. With the popularity of face-related applications, there has been much research on this topic. However, the results of existing methods are still limited to low-resolution and lack photorealism. In this work, we present a one-shot and high-resolution face reenactment method called MegaFR. To be precise, we leverage StyleGAN by using 3DMM-based rendering images and overcome the lack of high-quality video datasets by designing a loss function that works without high-quality videos. Also, we apply iterative refinement to deal with extreme poses and/or expressions. Since the proposed method controls source images through 3DMM parameters, we can explicitly manipulate source images. We apply MegaFR to various applications such as face frontalization, eye in-painting, and talking head generation. Experimental results show that our method successfully disentangles identity from expression and head pose, and outperforms conventional methods.

* 29 pages, 19 figures 

3D GAN Inversion for Controllable Portrait Image Animation

Mar 25, 2022
Connor Z. Lin, David B. Lindell, Eric R. Chan, Gordon Wetzstein

Millions of images of human faces are captured every single day; but these photographs portray the likeness of an individual with a fixed pose, expression, and appearance. Portrait image animation enables the post-capture adjustment of these attributes from a single image while maintaining a photorealistic reconstruction of the subject's likeness or identity. Still, current methods for portrait image animation are typically based on 2D warping operations or manipulations of a 2D generative adversarial network (GAN) and lack explicit mechanisms to enforce multi-view consistency. Thus these methods may significantly alter the identity of the subject, especially when the viewpoint relative to the camera is changed. In this work, we leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency. We propose a supervision strategy to flexibly manipulate expressions with 3D morphable models, and we show that the proposed method also supports editing appearance attributes, such as age or hairstyle, by interpolating within the latent space of the GAN. The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer while also supporting attribute editing.

* Project page: 

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

Apr 12, 2021
Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, Sanja Fidler

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. Concretely, we learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images supplemented with only few labeled ones. We build our architecture on top of StyleGAN2, augmented with a label synthesis branch. Image labeling at test time is achieved by first embedding the target image into the joint latent space via an encoder network and test-time optimization, and then generating the label from the inferred embedding. We evaluate our approach in two important domains: medical image segmentation and part-based face segmentation. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization, such as transferring from CT to MRI in medical imaging, and photographs of real faces to paintings, sculptures, and even cartoons and animal faces. Project Page: \url{}

* CVPR2021