Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Palette: Image-to-Image Diffusion Models

Nov 10, 2021
Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi

We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. On four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG decompression), Palette outperforms strong GAN and regression baselines, and establishes a new state of the art. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of using $L_2$ vs. $L_1$ loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, and report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images for various baselines. We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a single generalist Palette model trained on 3 tasks (colorization, inpainting, JPEG decompression) performs as well or better than task-specific specialist counterparts.

  

Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Mar 28, 2020
Runfa Chen, Wenbing Huang, Binghui Huang, Fuchun Sun, Bin Fang

Unsupervised image-to-image translation is a central task in computer vision. Current translation frameworks will abandon the discriminator once the training process is completed. This paper contends a novel role of the discriminator by reusing it for encoding the images of the target domain. The proposed architecture, termed as NICE-GAN, exhibits two advantageous patterns over previous approaches: First, it is more compact since no independent encoding component is required; Second, this plug-in encoder is directly trained by the adversary loss, making it more informative and trained more effectively if a multi-scale discriminator is applied. The main issue in NICE-GAN is the coupling of translation with discrimination along the encoder, which could incur training inconsistency when we play the min-max game via GAN. To tackle this issue, we develop a decoupled training strategy by which the encoder is only trained when maximizing the adversary loss while keeping frozen otherwise. Extensive experiments on four popular benchmarks demonstrate the superior performance of NICE-GAN over state-of-the-art methods in terms of FID, KID, and also human preference. Comprehensive ablation studies are also carried out to isolate the validity of each proposed component. Our codes are available at https://github.com/alpc91/NICE-GAN-pytorch.

* Accepted to CVPR 2020 
  

Reusing Discriminators for Encoding Towards Unsupervised Image-to-Image Translation

Mar 03, 2020
Runfa Chen, Wenbing Huang, Binghui Huang, Fuchun Sun, Bin Fang

Unsupervised image-to-image translation is a central task in computer vision. Current translation frameworks will abandon the discriminator once the training process is completed. This paper contends a novel role of the discriminator by reusing it for encoding the images of the target domain. The proposed architecture, termed as NICE-GAN, exhibits two advantageous patterns over previous approaches: First, it is more compact since no independent encoding component is required; Second, this plug-in encoder is directly trained by the adversary loss, making it more informative and trained more effectively if a multi-scale discriminator is applied. The main issue in NICE-GAN is the coupling of translation with discrimination along the encoder, which could incur training inconsistency when we play the min-max game via GAN. To tackle this issue, we develop a decoupled training strategy by which the encoder is only trained when maximizing the adversary loss while keeping frozen otherwise. Extensive experiments on four popular benchmarks demonstrate the superior performance of NICE-GAN over state-of-the-art methods in terms of FID, KID, and also human preference. Comprehensive ablation studies are also carried out to isolate the validity of each proposed component. Our codes are available at https://github.com/alpc91/NICE-GAN-pytorch.

* Accepted to CVPR 2020 
  

Auto-Encoding for Shared Cross Domain Feature Representation and Image-to-Image Translation

Jun 11, 2020
Safalya Pal

Image-to-image translation is a subset of computer vision and pattern recognition problems where our goal is to learn a mapping between input images of domain $\mathbf{X}_1$ and output images of domain $\mathbf{X}_2$. Current methods use neural networks with an encoder-decoder structure to learn a mapping $G:\mathbf{X}_1 \to\mathbf{X}_2$ such that the distribution of images from $\mathbf{X}_2$ and $G(\mathbf{X}_1)$ are identical, where $G(\mathbf{X}_1) = d_G (f_G (\mathbf{X}_1))$ and $f_G (\cdot)$ is referred as the encoder and $d_G(\cdot)$ is referred to as the decoder. Currently, such methods which also compute an inverse mapping $F:\mathbf{X}_2 \to \mathbf{X}_1$ use a separate encoder-decoder pair $d_F (f_F (\mathbf{X}_2))$ or at least a separate decoder $d_F (\cdot)$ to do so. Here we introduce a method to perform cross domain image-to-image translation across multiple domains using a single encoder-decoder architecture. We use an auto-encoder network which given an input image $\mathbf{X}_1$, first computes a latent domain encoding $Z_d = f_d (\mathbf{X}_1)$ and a latent content encoding $Z_c = f_c (\mathbf{X}_1)$, where the domain encoding $Z_d$ and content encoding $Z_c$ are independent. And then a decoder network $g(Z_d,Z_c)$ creates a reconstruction of the original image $\mathbf{\widehat{X}}_1=g(Z_d,Z_c )\approx \mathbf{X}_1$. Ideally, the domain encoding $Z_d$ contains no information regarding the content of the image and the content encoding $Z_c$ contains no information regarding the domain of the image. We use this property of the encodings to find the mapping across domains $G: X\to Y$ by simply changing the domain encoding $Z_d$ of the decoder's input. $G(\mathbf{X}_1 )=d(f_d (\mathbf{x}_2^i ),f_c (\mathbf{X}_1))$ where $\mathbf{x}_2^i$ is the $i^{th}$ observation of $\mathbf{X}_2$.

  

Mix and match networks: encoder-decoder alignment for zero-pair image translation

Apr 06, 2018
Yaxing Wang, Joost van de Weijer, Luis Herranz

We address the problem of image translation between domains or modalities for which no direct paired data is available (i.e. zero-pair translation). We propose mix and match networks, based on multiple encoders and decoders aligned in such a way that other encoder-decoder pairs can be composed at test time to perform unseen image translation tasks between domains or modalities for which explicit paired samples were not seen during training. We study the impact of autoencoders, side information and losses in improving the alignment and transferability of trained pairwise translation models to unseen translations. We show our approach is scalable and can perform colorization and style transfer between unseen combinations of domains. We evaluate our system in a challenging cross-modal setting where semantic segmentation is estimated from depth images, without explicit access to any depth-semantic segmentation training pairs. Our model outperforms baselines based on pix2pix and CycleGAN models.

* Accepted CVPR 2018 
  

Sketch-based Image Retrieval from Millions of Images under Rotation, Translation and Scale Variations

Oct 31, 2015
Sarthak Parui, Anurag Mittal

Proliferation of touch-based devices has made sketch-based image retrieval practical. While many methods exist for sketch-based object detection/image retrieval on small datasets, relatively less work has been done on large (web)-scale image retrieval. In this paper, we present an efficient approach for image retrieval from millions of images based on user-drawn sketches. Unlike existing methods for this problem which are sensitive to even translation or scale variations, our method handles rotation, translation, scale (i.e. a similarity transformation) and small deformations. The object boundaries are represented as chains of connected segments and the database images are pre-processed to obtain such chains that have a high chance of containing the object. This is accomplished using two approaches in this work: a) extracting long chains in contour segment networks and b) extracting boundaries of segmented object proposals. These chains are then represented by similarity-invariant variable length descriptors. Descriptor similarities are computed by a fast Dynamic Programming-based partial matching algorithm. This matching mechanism is used to generate a hierarchical k-medoids based indexing structure for the extracted chains of all database images in an offline process which is used to efficiently retrieve a small set of possible matched images for query chains. Finally, a geometric verification step is employed to test geometric consistency of multiple chain matches to improve results. Qualitative and quantitative results clearly demonstrate superiority of the approach over existing methods.

* submitted to IJCV, April 2015 
  

Simple and Efficient Unpaired Real-world Super-Resolution using Image Statistics

Sep 19, 2021
Kwangjin Yoon

Learning super-resolution (SR) network without the paired low resolution (LR) and high resolution (HR) image is difficult because direct supervision through the corresponding HR counterpart is unavailable. Recently, many real-world SR researches take advantage of the unpaired image-to-image translation technique. That is, they used two or more generative adversarial networks (GANs), each of which translates images from one domain to another domain, \eg, translates images from the HR domain to the LR domain. However, it is not easy to stably learn such a translation with GANs using unpaired data. In this study, we present a simple and efficient method of training of real-world SR network. To stably train the network, we use statistics of an image patch, such as means and variances. Our real-world SR framework consists of two GANs, one for translating HR images to LR images (degradation task) and the other for translating LR to HR (SR task). We argue that the unpaired image translation using GANs can be learned efficiently with our proposed data sampling strategy, namely, variance matching. We test our method on the NTIRE 2020 real-world SR dataset. Our method outperforms the current state-of-the-art method in terms of the SSIM metric as well as produces comparable results on the LPIPS metric.

  

UGAN: Untraceable GAN for Multi-Domain Face Translation

Sep 12, 2019
Defa Zhu, Si Liu, Wentao Jiang, Chen Gao, Tianyi Wu, Qaingchang Wang, Guodong Guo

The multi-domain image-to-image translation is a challenging task where the goal is to translate an image into multiple different domains. The target-only characteristics are desired for translated images, while the source-only characteristics should be erased. However, recent methods often suffer from retaining the characteristics of the source domain, which are incompatible with the target domain. To address this issue, we propose a method called Untraceable GAN, which has a novel source classifier to differentiate which domain an image is translated from, and determines whether the translated image still retains the characteristics of the source domain. Furthermore, we take the prototype of the target domain as the guidance for the translator to effectively synthesize the target-only characteristics. The translator is learned to synthesize the target-only characteristics and make the source domain untraceable for the discriminator, so that the source-only characteristics are erased. Finally, extensive experiments on three face editing tasks, including face aging, makeup, and expression editing, show that the proposed UGAN can produce superior results over the state-of-the-art models. The source code will be released.

  
<<
22
23
24
25
26
27
28
29
30
31
32
33
34
>>