Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Cartoon-to-real: An Approach to Translate Cartoon to Realistic Images using GAN

Nov 28, 2018
K M Arefeen Sultan, Labiba Kanij Rupty, Nahidul Islam Pranto, Sayed Khan Shuvo, Mohammad Imrul Jubair

We propose a method to translate cartoon images to real world images using Generative Aderserial Network (GAN). Existing GAN-based image-to-image translation methods which are trained on paired datasets are impractical as the data is difficult to accumulate. Therefore, in this paper we exploit the Cycle-Consistent Adversarial Networks (CycleGAN) method for images translation which needs an unpaired dataset. By applying CycleGAN we show that our model is able to generate meaningful real world images from cartoon images. However, we implement another state of the art technique $-$ Deep Analogy $-$ to compare the performance of our approach.

* 2 pages, Accepted (as poster) in International Conference on Innovation in Engineering and Technology (ICIET) 2018, 27 - 28 Dec 2018, Dhaka, Bangladesh 

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Dec 01, 2020
Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Gao Huang

Image-to-image translation aims at translating a particular style of an image to another. The synthesized images can be more photo-realistic and identity-preserving by decomposing the image into content and style in a disentangled manner. While existing models focus on designing specialized network architecture to separate the two components, this paper investigates how to explicitly constrain the content and style statistics of images. We achieve this goal by transforming the input image into high frequency and low frequency information, which correspond to the content and style, respectively. We regulate the frequency distribution from two aspects: a) a spatial level restriction to locally restrict the frequency distribution of images; b) a spectral level regulation to enhance the global consistency among images. On multiple datasets we show that the proposed approach consistently leads to significant improvements on top of various state-of-the-art image translation models.

* 13 pages 

MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation

May 28, 2021
George Cazenavette, Manuel Ladron De Guevara

While attention-based transformer networks achieve unparalleled success in nearly all language tasks, the large number of tokens coupled with the quadratic activation memory usage makes them prohibitive for visual tasks. As such, while language-to-language translation has been revolutionized by the transformer model, convolutional networks remain the de facto solution for image-to-image translation. The recently proposed MLP-Mixer architecture alleviates some of the speed and memory issues associated with attention-based networks while still retaining the long-range connections that make transformer models desirable. Leveraging this efficient alternative to self-attention, we propose a new unpaired image-to-image translation model called MixerGAN: a simpler MLP-based architecture that considers long-distance relationships between pixels without the need for expensive attention mechanisms. Quantitative and qualitative analysis shows that MixerGAN achieves competitive results when compared to prior convolutional-based methods.

* Under Review for NeurIPS 2021 

Leveraging in-domain supervision for unsupervised image-to-image translation tasks via multi-stream generators

Dec 30, 2021
Dvir Yerushalmi, Dov Danon, Amit H. Bermano

Supervision for image-to-image translation (I2I) tasks is hard to come by, but bears significant effect on the resulting quality. In this paper, we observe that for many Unsupervised I2I (UI2I) scenarios, one domain is more familiar than the other, and offers in-domain prior knowledge, such as semantic segmentation. We argue that for complex scenes, figuring out the semantic structure of the domain is hard, especially with no supervision, but is an important part of a successful I2I operation. We hence introduce two techniques to incorporate this invaluable in-domain prior knowledge for the benefit of translation quality: through a novel Multi-Stream generator architecture, and through a semantic segmentation-based regularization loss term. In essence, we propose splitting the input data according to semantic masks, explicitly guiding the network to different behavior for the different regions of the image. In addition, we propose training a semantic segmentation network along with the translation task, and to leverage this output as a loss term that improves robustness. We validate our approach on urban data, demonstrating superior quality in the challenging UI2I tasks of converting day images to night ones. In addition, we also demonstrate how reinforcing the target dataset with our augmented images improves the training of downstream tasks such as the classical detection one.


Multi-View Image-to-Image Translation Supervised by 3D Pose

Apr 12, 2021
Idit Diamant, Oranit Dror, Hai Victor Habi, Arnon Netzer

We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a joint learning of multiple unpaired image-to-image translation models, one per camera viewpoint. The joint learning is imposed by constraints on the shared 3D human pose in order to encourage the 2D pose projections in all views to be consistent. Experimental results on the CMU-Panoptic dataset demonstrate the effectiveness of the suggested framework in generating photo-realistic images of persons with new poses that are more consistent across all views in comparison to a standard Image-to-Image baseline. The code is available at:

* *equal contribution 

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)

Feb 18, 2018
Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Unsupervised image translation, which aims in translating two independent sets of images, is challenging in discovering the correct correspondences without paired data. Existing works build upon Generative Adversarial Network (GAN) such that the distribution of the translated images are indistinguishable from the distribution of the target set. However, such set-level constraints cannot learn the instance-level correspondences (e.g. aligned semantic parts in object configuration task). This limitation often results in false positives (e.g. geometric or semantic artifacts), and further leads to mode collapse problem. To address the above issues, we propose a novel framework for instance-level image translation by Deep Attention GAN (DA-GAN). Such a design enables DA-GAN to decompose the task of translating samples from two sets into translating instances in a highly-structured latent space. Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs. Therefore, the constraints could be exploited on both set-level and instance-level. Comparisons against several state-ofthe- arts demonstrate the superiority of our approach, and the broad application capability, e.g, pose morphing, data augmentation, etc., pushes the margin of domain translation problem.

* Computer Vision and Pattern Recognition 

Multi-Density Sketch-to-Image Translation Network

Jun 18, 2020
Jialu Huang, Jing Liao, Zhifeng Tan, Sam Kwong

Sketch-to-image (S2I) translation plays an important role in image synthesis and manipulation tasks, such as photo editing and colorization. Some specific S2I translation including sketch-to-photo and sketch-to-painting can be used as powerful tools in the art design industry. However, previous methods only support S2I translation with a single level of density, which gives less flexibility to users for controlling the input sketches. In this work, we propose the first multi-level density sketch-to-image translation framework, which allows the input sketch to cover a wide range from rough object outlines to micro structures. Moreover, to tackle the problem of noncontinuous representation of multi-level density input sketches, we project the density level into a continuous latent space, which can then be linearly controlled by a parameter. This allows users to conveniently control the densities of input sketches and generation of images. Moreover, our method has been successfully verified on various datasets for different applications including face editing, multi-modal sketch-to-photo translation, and anime colorization, providing coarse-to-fine levels of controls to these applications.

* 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works 

Cross-domain Correspondence Learning for Exemplar-based Image Translation

Apr 12, 2020
Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, Fang Wen

We present a general framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain (e.g., semantic segmentation mask, or edge map, or pose keypoints), given an exemplar image. The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar. We propose to jointly learn the crossdomain correspondence and the image translation, where both tasks facilitate each other and thus can be learned with weak supervision. The images from distinct domains are first aligned to an intermediate domain where dense correspondence is established. Then, the network synthesizes images based on the appearance of semantically corresponding patches in the exemplar. We demonstrate the effectiveness of our approach in several image translation tasks. Our method is superior to state-of-the-art methods in terms of image quality significantly, with the image style faithful to the exemplar with semantic consistency. Moreover, we show the utility of our method for several applications

* CVPR 2020 
* Accepted as a CVPR 2020 oral paper 

Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

Feb 29, 2020
Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

Unpaired image-to-image (I2I) translation has received considerable attention in pattern recognition and computer vision because of recent advancements in generative adversarial networks (GANs). However, due to the lack of explicit supervision, unpaired I2I models often fail to generate realistic images, especially in challenging datasets with different backgrounds and poses. Hence, stabilization is indispensable for real-world applications and GANs. Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation. Our main idea is to enforce consistency regularization originating from semi-supervised learning on the discriminators leveraging real, fake, reconstructed, and augmented samples. We regularize the discriminators to output similar predictions when fed pairs of original and perturbed images. We qualitatively clarify the generation property between unpaired I2I models and standard GANs, and explain why consistency regularization on fake and reconstructed samples works well. Quantitatively, our method outperforms the consistency regularized GAN (CR-GAN) in digit translations and demonstrates efficacy against several data augmentation variants and cycle-consistent constraints.


Informative Sample Mining Network for Multi-Domain Image-to-Image Translation

Jan 05, 2020
Jie Cao, Huaibo Huang, Yi Li, Ran He, Zhenan Sun

The performance of multi-domain image-to-image translation has been significantly improved by recent progress in deep generative models. Existing approaches can use a unified model to achieve translations between all the visual domains. However, their outcomes are far from satisfying when there are large domain variations. In this paper, we reveal that improving the strategy of sample selection is an effective solution. To select informative samples, we dynamically estimate sample importance during the training of Generative Adversarial Networks, presenting Informative Sample Mining Network. We theoretically analyze the relationship between the sample importance and the prediction of the global optimal discriminator. Then a practical importance estimation function based on general discriminators is derived. In addition, we propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness. Extensive experiments on a wide range of specific image-to-image translation tasks are conducted, and the results demonstrate our superiority over current state-of-the-art methods.