Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun-Yan Zhu

The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Aug 24, 2020
William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, Antonio Torralba

Figure 1 for The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Figure 2 for The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Figure 3 for The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Figure 4 for The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement

Existing disentanglement methods for deep generative models rely on hand-picked priors and complex encoder-based architectures. In this paper, we propose the Hessian Penalty, a simple regularization term that encourages the Hessian of a generative model with respect to its input to be diagonal. We introduce a model-agnostic, unbiased stochastic approximation of this term based on Hutchinson's estimator to compute it efficiently during training. Our method can be applied to a wide range of deep generators with just a few lines of code. We show that training with the Hessian Penalty often causes axis-aligned disentanglement to emerge in latent space when applied to ProGAN on several datasets. Additionally, we use our regularization term to identify interpretable directions in BigGAN's latent space in an unsupervised fashion. Finally, we provide empirical evidence that the Hessian Penalty encourages substantial shrinkage when applied to over-parameterized latent spaces.

* ECCV 2020 (Spotlight). Code available at https://github.com/wpeebles/hessian_penalty . Project page and videos available at https://www.wpeebles.com/hessian-penalty

Via

Access Paper or Ask Questions

Contrastive Learning for Unpaired Image-to-Image Translation

Aug 20, 2020
Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu

Figure 1 for Contrastive Learning for Unpaired Image-to-Image Translation

Figure 2 for Contrastive Learning for Unpaired Image-to-Image Translation

Figure 3 for Contrastive Learning for Unpaired Image-to-Image Translation

Figure 4 for Contrastive Learning for Unpaired Image-to-Image Translation

In image-to-image translation, each patch in the output should reflect the content of the corresponding patch in the input, independent of domain. We propose a straightforward method for doing so -- maximizing mutual information between the two, using a framework based on contrastive learning. The method encourages two elements (corresponding patches) to map to a similar point in a learned feature space, relative to other elements (other patches) in the dataset, referred to as negatives. We explore several critical design choices for making contrastive learning effective in the image synthesis setting. Notably, we use a multilayer, patch-based approach, rather than operate on entire images. Furthermore, we draw negatives from within the input image itself, rather than from the rest of the dataset. We demonstrate that our framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time. In addition, our method can even be extended to the training setting where each "domain" is only a single image.

* ECCV 2020. Please visit https://taesungp.github.io/ContrastiveUnpairedTranslation/ for introduction videos and more. v3 contains typo fixes and citation update

Via

Access Paper or Ask Questions

Rewriting a Deep Generative Model

Jul 30, 2020
David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba

Figure 1 for Rewriting a Deep Generative Model

Figure 2 for Rewriting a Deep Generative Model

Figure 3 for Rewriting a Deep Generative Model

Figure 4 for Rewriting a Deep Generative Model

A deep generative model such as a GAN learns to model a rich set of semantic and physical rules about the target distribution, but up to now, it has been obscure how such rules are encoded in the network, or how a rule could be changed. In this paper, we introduce a new problem setting: manipulation of specific rules encoded by a deep generative model. To address the problem, we propose a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory. We derive an algorithm for modifying one entry of the associative memory, and we demonstrate that several interesting structural rules can be located and modified within the layers of state-of-the-art generative models. We present a user interface to enable users to interactively change the rules of a generative model to achieve desired effects, and we show several proof-of-concept applications. Finally, results on multiple datasets demonstrate the advantage of our method against standard fine-tuning methods and edit transfer algorithms.

* ECCV 2020 (oral). Code at https://github.com/davidbau/rewriting. For videos and demos see https://rewriting.csail.mit.edu/

Via

Access Paper or Ask Questions

Swapping Autoencoder for Deep Image Manipulation

Jul 01, 2020
Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang

Figure 1 for Swapping Autoencoder for Deep Image Manipulation

Figure 2 for Swapping Autoencoder for Deep Image Manipulation

Figure 3 for Swapping Autoencoder for Deep Image Manipulation

Figure 4 for Swapping Autoencoder for Deep Image Manipulation

Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.

Via

Access Paper or Ask Questions

Differentiable Augmentation for Data-Efficient GAN Training

Jun 18, 2020
Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, Song Han

Figure 1 for Differentiable Augmentation for Data-Efficient GAN Training

Figure 2 for Differentiable Augmentation for Data-Efficient GAN Training

Figure 3 for Differentiable Augmentation for Data-Efficient GAN Training

Figure 4 for Differentiable Augmentation for Data-Efficient GAN Training

The performance of generative adversarial networks (GANs) heavily deteriorates given a limited amount of training data. This is mainly because the discriminator is memorizing the exact training set. To combat it, we propose Differentiable Augmentation (DiffAugment), a simple method that improves the data efficiency of GANs by imposing various types of differentiable augmentations on both real and fake samples. Previous attempts to directly augment the training data manipulate the distribution of real images, yielding little benefit; DiffAugment enables us to adopt the differentiable augmentation for the generated samples, effectively stabilizes training, and leads to better convergence. Experiments demonstrate consistent gains of our method over a variety of GAN architectures and loss functions for both unconditional and class-conditional generation. With DiffAugment, we achieve a state-of-the-art FID of 6.80 with an IS of 100.8 on ImageNet 128x128. Furthermore, with only 20% training data, we can match the top performance on CIFAR-10 and CIFAR-100. Finally, our method can generate high-fidelity images using only 100 images without pre-training, while being on par with existing transfer learning algorithms. Code is available at https://github.com/mit-han-lab/data-efficient-gans.

Via

Access Paper or Ask Questions

Diverse Image Generation via Self-Conditioned GANs

Jun 18, 2020
Steven Liu, Tongzhou Wang, David Bau, Jun-Yan Zhu, Antonio Torralba

Figure 1 for Diverse Image Generation via Self-Conditioned GANs

Figure 2 for Diverse Image Generation via Self-Conditioned GANs

Figure 3 for Diverse Image Generation via Self-Conditioned GANs

Figure 4 for Diverse Image Generation via Self-Conditioned GANs

We introduce a simple but effective unsupervised method for generating realistic and diverse images. We train a class-conditional GAN model without using manually annotated class labels. Instead, our model is conditional on labels automatically derived from clustering in the discriminator's feature space. Our clustering step automatically discovers diverse modes, and explicitly requires the generator to cover them. Experiments on standard mode collapse benchmarks show that our method outperforms several competing methods when addressing mode collapse. Our method also performs well on large-scale datasets such as ImageNet and Places365, improving both image diversity and standard quality metrics, compared to previous methods.

* CVPR 2020. Code: https://github.com/stevliu/self-conditioned-gan. Webpage: http://selfcondgan.csail.mit.edu/

Via

Access Paper or Ask Questions

Semantic Photo Manipulation with a Generative Image Prior

May 15, 2020
David Bau, Hendrik Strobelt, William Peebles, Jonas, Bolei Zhou, Jun-Yan Zhu, Antonio Torralba

Figure 1 for Semantic Photo Manipulation with a Generative Image Prior

Figure 2 for Semantic Photo Manipulation with a Generative Image Prior

Figure 3 for Semantic Photo Manipulation with a Generative Image Prior

Figure 4 for Semantic Photo Manipulation with a Generative Image Prior

Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating the high-level attributes of an existing natural photograph with GANs is challenging for two reasons. First, it is hard for GANs to precisely reproduce an input image. Second, after manipulation, the newly synthesized pixels often do not fit the original image. In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image. Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image. We demonstrate our interactive system on several semantic image editing tasks, including synthesizing new objects consistent with background, removing unwanted objects, and changing the appearance of an object. Quantitative and qualitative comparisons against several existing methods demonstrate the effectiveness of our method.

* Bau, David, et al. "Semantic photo manipulation with a generative image prior." ACM Transactions on Graphics (TOG) 38.4 (2019)
* SIGGRAPH 2019

Via

Access Paper or Ask Questions

Transforming and Projecting Images into Class-conditional Generative Networks

May 04, 2020
Minyoung Huh, Richard Zhang, Jun-Yan Zhu, Sylvain Paris, Aaron Hertzmann

Figure 1 for Transforming and Projecting Images into Class-conditional Generative Networks

Figure 2 for Transforming and Projecting Images into Class-conditional Generative Networks

Figure 3 for Transforming and Projecting Images into Class-conditional Generative Networks

Figure 4 for Transforming and Projecting Images into Class-conditional Generative Networks

We present a method for projecting an input image into the space of a class-conditional generative neural network. We propose a method that optimizes for transformation to counteract the model biases in a generative neural networks. Specifically, we demonstrate that one can solve for image translation, scale, and global color transformation, during the projection optimization to address the object-center bias of a Generative Adversarial Network. This projection process poses a difficult optimization problem, and purely gradient-based optimizations fail to find good solutions. We describe a hybrid optimization strategy that finds good projections by estimating transformations and class parameters. We show the effectiveness of our method on real images and further demonstrate how the corresponding projections lead to better edit-ability of these images.

Via

Access Paper or Ask Questions