Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Can Giraffes Become Birds? An Evaluation of Image-to-image Translation for Data Generation

Jan 10, 2020
Daniel V. Ruiz, Gabriel Salomon, Eduardo Todt

There is an increasing interest in image-to-image translation with applications ranging from generating maps from satellite images to creating entire clothes' images from only contours. In the present work, we investigate image-to-image translation using Generative Adversarial Networks (GANs) for generating new data, taking as a case study the morphing of giraffes images into bird images. Morphing a giraffe into a bird is a challenging task, as they have different scales, textures, and morphology. An unsupervised cross-domain translator entitled InstaGAN was trained on giraffes and birds, along with their respective masks, to learn translation between both domains. A dataset of synthetic bird images was generated using translation from originally giraffe images while preserving the original spatial arrangement and background. It is important to stress that the generated birds do not exist, being only the result of a latent representation learned by InstaGAN. Two subsets of common literature datasets were used for training the GAN and generating the translated images: COCO and Caltech-UCSD Birds 200-2011. To evaluate the realness and quality of the generated images and masks, qualitative and quantitative analyses were made. For the quantitative analysis, a pre-trained Mask R-CNN was used for the detection and segmentation of birds on Pascal VOC, Caltech-UCSD Birds 200-2011, and our new dataset entitled FakeSet. The generated dataset achieved detection and segmentation results close to the real datasets, suggesting that the generated images are realistic enough to be detected and segmented by a state-of-the-art deep neural network.

* Accepted for presentation at the Computer on the Beach (COTB'20) 2020 
Access Paper or Ask Questions

Unified Generative Adversarial Networks for Controllable Image-to-Image Translation

Dec 12, 2019
Hao Tang, Hong Liu, Nicu Sebe

Controllable image-to-image translation, i.e., transferring an image from a source domain to a target one guided by controllable structures, has attracted much attention in both academia and industry. In this paper, we propose a unified Generative Adversarial Network (GAN) framework for controllable image-to-image translation. In addition to conditioning on a reference image, we show how the model can generate images conditioned on controllable structures, e.g., class labels, object keypoints, human skeletons and scene semantic maps. The proposed GAN framework consists of a single generator and a discriminator taking a conditional image and the target controllable structure as input. In this way, the conditional image can provide appearance information and the controllable structure can provide the structure information for generating the target result. Moreover, the proposed GAN learns the image-to-image mapping through three novel losses, i.e., color loss, controllable structure-guided cycle-consistency loss and controllable structure-guided self-identity preserving loss. Note that the proposed color loss handles the issue of "channel pollution" when back-propagating the gradients. In addition, we present the Fr\'echet ResNet Distance (FRD) to evaluate the quality of generated images. Extensive qualitative and quantitative experiments on two challenging image translation tasks with four different datasets demonstrate that the proposed GAN model generates convincing results, and significantly outperforms other state-of-the-art methods on both tasks. Meanwhile, the proposed GAN framework is a unified solution, thus it can be applied to solving other controllable structure-guided image-to-image translation tasks, such as landmark-guided facial expression translation and keypoint-guided person image generation.

* An extended version of a paper published in ACM MM 2018. arXiv admin note: substantial text overlap with arXiv:1808.04859. The source code, data and trained models are available at 
Access Paper or Ask Questions

Panoptic-based Object Style-Align for Image-to-Image Translation

Dec 03, 2021
Liyun Zhang, Photchara Ratsamee, Bowen Wang, Manabu Higashida, Yuki Uranishi, Haruo Takemura

Despite remarkable recent progress in image translation, the complex scene with multiple discrepant objects remains a challenging problem. Because the translated images have low fidelity and tiny objects in fewer details and obtain unsatisfactory performance in object recognition. Without the thorough object perception (i.e., bounding boxes, categories, and masks) of the image as prior knowledge, the style transformation of each object will be difficult to track in the image translation process. We propose panoptic-based object style-align generative adversarial networks (POSA-GANs) for image-to-image translation together with a compact panoptic segmentation dataset. The panoptic segmentation model is utilized to extract panoptic-level perception (i.e., overlap-removed foreground object instances and background semantic regions in the image). This is utilized to guide the alignment between the object content codes of the input domain image and object style codes sampled from the style space of the target domain. The style-aligned object representations are further transformed to obtain precise boundaries layout for higher fidelity object generation. The proposed method was systematically compared with different competing methods and obtained significant improvement on both image quality and object recognition performance for translated images.

Access Paper or Ask Questions

Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization

Sep 06, 2020
Zhiwen Zuo, Qijiang Xu, Huiming Zhang, Zhizhong Wang, Haibo Chen, Ailin Li, Lei Zhao, Wei Xing, Dongming Lu

In this paper, we present a novel framework that can achieve multimodal image-to-image translation by simply encouraging the statistical dependence between the latent code and the output image in conditional generative adversarial networks. In addition, by incorporating a U-net generator into our framework, our method only needs to learn a one-sided translation model from the source image domain to the target image domain for both supervised and unsupervised multimodal image-to-image translation. Furthermore, our method also achieves disentanglement between the source domain content and the target domain style for free. We conduct experiments under supervised and unsupervised settings on various benchmark image-to-image translation datasets compared with the state-of-the-art methods, showing the effectiveness and simplicity of our method to achieve multimodal and high-quality results.

Access Paper or Ask Questions

Contrastive Learning for Unsupervised Image-to-Image Translation

May 07, 2021
Hanbit Lee, Jinseok Seol, Sang-goo Lee

Image-to-image translation aims to learn a mapping between different groups of visually distinguishable images. While recent methods have shown impressive ability to change even intricate appearance of images, they still rely on domain labels in training a model to distinguish between distinct visual features. Such dependency on labels often significantly limits the scope of applications since consistent and high-quality labels are expensive. Instead, we wish to capture visual features from images themselves and apply them to enable realistic translation without human-generated labels. To this end, we propose an unsupervised image-to-image translation method based on contrastive learning. The key idea is to learn a discriminator that differentiates between distinctive styles and let the discriminator supervise a generator to transfer those styles across images. During training, we randomly sample a pair of images and train the generator to change the appearance of one towards another while keeping the original structure. Experimental results show that our method outperforms the leading unsupervised baselines in terms of visual quality and translation accuracy.

Access Paper or Ask Questions