Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Unsupervised Image-to-Image Translation with Generative Adversarial Networks

Jan 10, 2017
Hao Dong, Paarth Neekhara, Chao Wu, Yike Guo

It's useful to automatically transform an image from its original form to some synthetic form (style, partial contents, etc.), while keeping the original structure or semantics. We define this requirement as the "image-to-image translation" problem, and propose a general approach to achieve it, based on deep convolutional and conditional generative adversarial networks (GANs), which has gained a phenomenal success to learn mapping images from noise input since 2014. In this work, we develop a two step (unsupervised) learning method to translate images between different domains by using unlabeled images without specifying any correspondence between them, so that to avoid the cost of acquiring labeled data. Compared with prior works, we demonstrated the capacity of generality in our model, by which variance of translations can be conduct by a single type of model. Such capability is desirable in applications like bidirectional translation


Leveraging Local Domains for Image-to-Image Translation

Sep 09, 2021
Anthony Dell'Eva, Fabio Pizzati, Massimo Bertozzi, Raoul de Charette

Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation. Relying on a simple geometrical guidance, we train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target. We experiment on three tasks ranging from unstructured environments to adverse weather. Our comprehensive evaluation setting shows we are able to generate realistic translations, with minimal priors, and training only on a few images. Furthermore, when trained on our translations images we show that all tested proxy tasks are significantly improved, without ever seeing target domain at training.

* Submitted to conference 

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

Nov 11, 2020
Yaxing Wang, Lu Yu, Joost van de Weijer

Image-to-image translation has recently achieved remarkable results. But despite current success, it suffers from inferior performance when translations between classes require large shape changes. We attribute this to the high-resolution bottlenecks which are used by current state-of-the-art image-to-image methods. Therefore, in this work, we propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I. We learn a model by leveraging hierarchical features: (a) structural information contained in the shallow layers and (b) semantic information extracted from the deep layers. To enable the training of deep I2I models on small datasets, we propose a novel transfer learning method, that transfers knowledge from pre-trained GANs. Specifically, we leverage the discriminator of a pre-trained GANs (i.e. BigGAN or StyleGAN) to initialize both the encoder and the discriminator and the pre-trained generator to initialize the generator of our model. Applying knowledge transfer leads to an alignment problem between the encoder and generator. We introduce an adaptor network to address this. On many-class image-to-image translation on three datasets (Animal faces, Birds, and Foods) we decrease mFID by at least 35% when compared to the state-of-the-art. Furthermore, we qualitatively and quantitatively demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets. Finally, we are the first to perform I2I translations for domains with over 100 classes.

* NeurIPS2020 

Region-aware Knowledge Distillation for Efficient Image-to-Image Translation

May 25, 2022
Linfeng Zhang, Xin Chen, Runpei Dong, Kaisheng Ma

Recent progress in image-to-image translation has witnessed the success of generative adversarial networks (GANs). However, GANs usually contain a huge number of parameters, which lead to intolerant memory and computation consumption and limit their deployment on edge devices. To address this issue, knowledge distillation is proposed to transfer the knowledge from a cumbersome teacher model to an efficient student model. However, most previous knowledge distillation methods are designed for image classification and lead to limited performance in image-to-image translation. In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models. Firstly, ReKo adaptively finds the crucial regions in the images with an attention module. Then, patch-wise contrastive learning is adopted to maximize the mutual information between students and teachers in these crucial regions. Experiments with eight comparison methods on nine datasets demonstrate the substantial effectiveness of ReKo on both paired and unpaired image-to-image translation. For instance, our 7.08X compressed and 6.80X accelerated CycleGAN student outperforms its teacher by 1.33 and 1.04 FID scores on Horse to Zebra and Zebra to Horse, respectively. Codes will be released on GitHub.


Image-to-Image Translation with Multi-Path Consistency Regularization

May 29, 2019
Jianxin Lin, Yingce Xia, Yijun Wang, Tao Qin, Zhibo Chen

Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from source domain $\mathcal{D}_s$ to target domain $\mathcal{D}_t$ as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version through $\mathcal{D}_s\to\mathcal{D}_t\to\mathcal{D}_s$ translation. In this work, we introduce a new kind of loss, multi-path consistency loss, which evaluates the differences between direct translation $\mathcal{D}_s\to\mathcal{D}_t$ and indirect translation $\mathcal{D}_s\to\mathcal{D}_a\to\mathcal{D}_t$ with $\mathcal{D}_a$ as an auxiliary domain, to regularize training. For multi-domain translation (at least, three) which focuses on building translation models between any two domains, at each training iteration, we randomly select three domains, set them respectively as the source, auxiliary and target domains, build the multi-path consistency loss and optimize the network. For two-domain translation, we need to introduce an additional auxiliary domain and construct the multi-path consistency loss. We conduct various experiments to demonstrate the effectiveness of our proposed methods, including face-to-face translation, paint-to-photo translation, and de-raining/de-noising translation.

* 8 pages, 6 figures. Accepted by the 28th International Joint Conference on Artificial Intelligence (IJCAI-2019) 

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Jan 14, 2019
Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe

State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data. However, these methods require the training of one specific model for every pair of image domains, which limits the scalability in dealing with more than two image domains. In addition, the training stage of these methods has the common problem of model collapse that degrades the quality of the generated images. To tackle these issues, we propose a Dual Generator Generative Adversarial Network (G$^2$GAN), which is a robust and scalable approach allowing to perform unpaired image-to-image translation for multiple domains using only dual generators within a single model. Moreover, we explore different optimization losses for better training of G$^2$GAN, and thus make unpaired image-to-image translation with higher consistency and better stability. Extensive experiments on six publicly available datasets with different scenarios, i.e., architectural buildings, seasons, landscape and human faces, demonstrate that the proposed G$^2$GAN achieves superior model capacity and better generation performance comparing with existing image-to-image translation GAN models.

* 16 pages, 7 figures, accepted to ACCV 2018 

Face Translation between Images and Videos using Identity-aware CycleGAN

Dec 04, 2017
Zhiwu Huang, Bernhard Kratzwald, Danda Pani Paudel, Jiqing Wu, Luc Van Gool

This paper presents a new problem of unpaired face translation between images and videos, which can be applied to facial video prediction and enhancement. In this problem there exist two major technical challenges: 1) designing a robust translation model between static images and dynamic videos, and 2) preserving facial identity during image-video translation. To address such two problems, we generalize the state-of-the-art image-to-image translation network (Cycle-Consistent Adversarial Networks) to the image-to-video/video-to-image translation context by exploiting a image-video translation model and an identity preservation model. In particular, we apply the state-of-the-art Wasserstein GAN technique to the setting of image-video translation for better convergence, and we meanwhile introduce a face verificator to ensure the identity. Experiments on standard image/video face datasets demonstrate the effectiveness of the proposed model in both terms of qualitative and quantitative evaluations.


Realistic Endoscopic Image Generation Method Using Virtual-to-real Image-domain Translation

Jan 13, 2022
Masahiro Oda, Kiyohito Tanaka, Hirotsugu Takabatake, Masaki Mori, Hiroshi Natori, Kensaku Mori

This paper proposes a realistic image generation method for visualization in endoscopic simulation systems. Endoscopic diagnosis and treatment are performed in many hospitals. To reduce complications related to endoscope insertions, endoscopic simulation systems are used for training or rehearsal of endoscope insertions. However, current simulation systems generate non-realistic virtual endoscopic images. To improve the value of the simulation systems, improvement of reality of their generated images is necessary. We propose a realistic image generation method for endoscopic simulation systems. Virtual endoscopic images are generated by using a volume rendering method from a CT volume of a patient. We improve the reality of the virtual endoscopic images using a virtual-to-real image-domain translation technique. The image-domain translator is implemented as a fully convolutional network (FCN). We train the FCN by minimizing a cycle consistency loss function. The FCN is trained using unpaired virtual and real endoscopic images. To obtain high quality image-domain translation results, we perform an image cleansing to the real endoscopic image set. We tested to use the shallow U-Net, U-Net, deep U-Net, and U-Net having residual units as the image-domain translator. The deep U-Net and U-Net having residual units generated quite realistic images.

* Healthcare Technology Letters, Vol.6, No.6, pp.214-219, 2019 
* Accepted paper as an oral presentation at the Joint MICCAI workshop MIAR | AE-CAI | CARE 2019 

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Sep 21, 2018
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo

Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8789-8797 
* Accepted to CVPR 2018 (Oral) 

VQBB: Image-to-image Translation with Vector Quantized Brownian Bridge

May 16, 2022
Bo Li, Kaitao Xue, Bin Liu, Yu-Kun Lai

Image-to-image translation is an important and challenging problem in computer vision. Existing approaches like Pixel2Pixel, DualGAN suffer from the instability of GAN and fail to generate diverse outputs because they model the task as a one-to-one mapping. Although diffusion models can generate images with high quality and diversity, current conditional diffusion models still can not maintain high similarity with the condition image on image-to-image translation tasks due to the Gaussian noise added in the reverse process. To address these issues, a novel Vector Quantized Brownian Bridge(VQBB) diffusion model is proposed in this paper. On one hand, Brownian Bridge diffusion process can model the transformation between two domains more accurate and flexible than the existing Markov diffusion methods. As far as the authors know, it is the first work for Brownian Bridge diffusion process proposed for image-to-image translation. On the other hand, the proposed method improved the learning efficiency and translation accuracy by confining the diffusion process in the quantized latent space. Finally, numerical experimental results validated the performance of the proposed method.

* 5 pages, 5 figures