Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

Nov 11, 2020
Yaxing Wang, Lu Yu, Joost van de Weijer

Image-to-image translation has recently achieved remarkable results. But despite current success, it suffers from inferior performance when translations between classes require large shape changes. We attribute this to the high-resolution bottlenecks which are used by current state-of-the-art image-to-image methods. Therefore, in this work, we propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I. We learn a model by leveraging hierarchical features: (a) structural information contained in the shallow layers and (b) semantic information extracted from the deep layers. To enable the training of deep I2I models on small datasets, we propose a novel transfer learning method, that transfers knowledge from pre-trained GANs. Specifically, we leverage the discriminator of a pre-trained GANs (i.e. BigGAN or StyleGAN) to initialize both the encoder and the discriminator and the pre-trained generator to initialize the generator of our model. Applying knowledge transfer leads to an alignment problem between the encoder and generator. We introduce an adaptor network to address this. On many-class image-to-image translation on three datasets (Animal faces, Birds, and Foods) we decrease mFID by at least 35% when compared to the state-of-the-art. Furthermore, we qualitatively and quantitatively demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets. Finally, we are the first to perform I2I translations for domains with over 100 classes.

* NeurIPS2020 
Access Paper or Ask Questions

Region-aware Knowledge Distillation for Efficient Image-to-Image Translation

May 25, 2022
Linfeng Zhang, Xin Chen, Runpei Dong, Kaisheng Ma

Recent progress in image-to-image translation has witnessed the success of generative adversarial networks (GANs). However, GANs usually contain a huge number of parameters, which lead to intolerant memory and computation consumption and limit their deployment on edge devices. To address this issue, knowledge distillation is proposed to transfer the knowledge from a cumbersome teacher model to an efficient student model. However, most previous knowledge distillation methods are designed for image classification and lead to limited performance in image-to-image translation. In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models. Firstly, ReKo adaptively finds the crucial regions in the images with an attention module. Then, patch-wise contrastive learning is adopted to maximize the mutual information between students and teachers in these crucial regions. Experiments with eight comparison methods on nine datasets demonstrate the substantial effectiveness of ReKo on both paired and unpaired image-to-image translation. For instance, our 7.08X compressed and 6.80X accelerated CycleGAN student outperforms its teacher by 1.33 and 1.04 FID scores on Horse to Zebra and Zebra to Horse, respectively. Codes will be released on GitHub.

Access Paper or Ask Questions

Image-to-Image Translation with Multi-Path Consistency Regularization

May 29, 2019
Jianxin Lin, Yingce Xia, Yijun Wang, Tao Qin, Zhibo Chen

Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from source domain $\mathcal{D}_s$ to target domain $\mathcal{D}_t$ as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version through $\mathcal{D}_s\to\mathcal{D}_t\to\mathcal{D}_s$ translation. In this work, we introduce a new kind of loss, multi-path consistency loss, which evaluates the differences between direct translation $\mathcal{D}_s\to\mathcal{D}_t$ and indirect translation $\mathcal{D}_s\to\mathcal{D}_a\to\mathcal{D}_t$ with $\mathcal{D}_a$ as an auxiliary domain, to regularize training. For multi-domain translation (at least, three) which focuses on building translation models between any two domains, at each training iteration, we randomly select three domains, set them respectively as the source, auxiliary and target domains, build the multi-path consistency loss and optimize the network. For two-domain translation, we need to introduce an additional auxiliary domain and construct the multi-path consistency loss. We conduct various experiments to demonstrate the effectiveness of our proposed methods, including face-to-face translation, paint-to-photo translation, and de-raining/de-noising translation.

* 8 pages, 6 figures. Accepted by the 28th International Joint Conference on Artificial Intelligence (IJCAI-2019) 
Access Paper or Ask Questions

Dual Generator Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Jan 14, 2019
Hao Tang, Dan Xu, Wei Wang, Yan Yan, Nicu Sebe

State-of-the-art methods for image-to-image translation with Generative Adversarial Networks (GANs) can learn a mapping from one domain to another domain using unpaired image data. However, these methods require the training of one specific model for every pair of image domains, which limits the scalability in dealing with more than two image domains. In addition, the training stage of these methods has the common problem of model collapse that degrades the quality of the generated images. To tackle these issues, we propose a Dual Generator Generative Adversarial Network (G$^2$GAN), which is a robust and scalable approach allowing to perform unpaired image-to-image translation for multiple domains using only dual generators within a single model. Moreover, we explore different optimization losses for better training of G$^2$GAN, and thus make unpaired image-to-image translation with higher consistency and better stability. Extensive experiments on six publicly available datasets with different scenarios, i.e., architectural buildings, seasons, landscape and human faces, demonstrate that the proposed G$^2$GAN achieves superior model capacity and better generation performance comparing with existing image-to-image translation GAN models.

* 16 pages, 7 figures, accepted to ACCV 2018 
Access Paper or Ask Questions

Face Translation between Images and Videos using Identity-aware CycleGAN

Dec 04, 2017
Zhiwu Huang, Bernhard Kratzwald, Danda Pani Paudel, Jiqing Wu, Luc Van Gool

This paper presents a new problem of unpaired face translation between images and videos, which can be applied to facial video prediction and enhancement. In this problem there exist two major technical challenges: 1) designing a robust translation model between static images and dynamic videos, and 2) preserving facial identity during image-video translation. To address such two problems, we generalize the state-of-the-art image-to-image translation network (Cycle-Consistent Adversarial Networks) to the image-to-video/video-to-image translation context by exploiting a image-video translation model and an identity preservation model. In particular, we apply the state-of-the-art Wasserstein GAN technique to the setting of image-video translation for better convergence, and we meanwhile introduce a face verificator to ensure the identity. Experiments on standard image/video face datasets demonstrate the effectiveness of the proposed model in both terms of qualitative and quantitative evaluations.

Access Paper or Ask Questions

Realistic Endoscopic Image Generation Method Using Virtual-to-real Image-domain Translation

Jan 13, 2022
Masahiro Oda, Kiyohito Tanaka, Hirotsugu Takabatake, Masaki Mori, Hiroshi Natori, Kensaku Mori

This paper proposes a realistic image generation method for visualization in endoscopic simulation systems. Endoscopic diagnosis and treatment are performed in many hospitals. To reduce complications related to endoscope insertions, endoscopic simulation systems are used for training or rehearsal of endoscope insertions. However, current simulation systems generate non-realistic virtual endoscopic images. To improve the value of the simulation systems, improvement of reality of their generated images is necessary. We propose a realistic image generation method for endoscopic simulation systems. Virtual endoscopic images are generated by using a volume rendering method from a CT volume of a patient. We improve the reality of the virtual endoscopic images using a virtual-to-real image-domain translation technique. The image-domain translator is implemented as a fully convolutional network (FCN). We train the FCN by minimizing a cycle consistency loss function. The FCN is trained using unpaired virtual and real endoscopic images. To obtain high quality image-domain translation results, we perform an image cleansing to the real endoscopic image set. We tested to use the shallow U-Net, U-Net, deep U-Net, and U-Net having residual units as the image-domain translator. The deep U-Net and U-Net having residual units generated quite realistic images.

* Healthcare Technology Letters, Vol.6, No.6, pp.214-219, 2019 
* Accepted paper as an oral presentation at the Joint MICCAI workshop MIAR | AE-CAI | CARE 2019 
Access Paper or Ask Questions

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Sep 21, 2018
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, Jaegul Choo

Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8789-8797 
* Accepted to CVPR 2018 (Oral) 
Access Paper or Ask Questions

VQBB: Image-to-image Translation with Vector Quantized Brownian Bridge

May 16, 2022
Bo Li, Kaitao Xue, Bin Liu, Yu-Kun Lai

Image-to-image translation is an important and challenging problem in computer vision. Existing approaches like Pixel2Pixel, DualGAN suffer from the instability of GAN and fail to generate diverse outputs because they model the task as a one-to-one mapping. Although diffusion models can generate images with high quality and diversity, current conditional diffusion models still can not maintain high similarity with the condition image on image-to-image translation tasks due to the Gaussian noise added in the reverse process. To address these issues, a novel Vector Quantized Brownian Bridge(VQBB) diffusion model is proposed in this paper. On one hand, Brownian Bridge diffusion process can model the transformation between two domains more accurate and flexible than the existing Markov diffusion methods. As far as the authors know, it is the first work for Brownian Bridge diffusion process proposed for image-to-image translation. On the other hand, the proposed method improved the learning efficiency and translation accuracy by confining the diffusion process in the quantized latent space. Finally, numerical experimental results validated the performance of the proposed method.

* 5 pages, 5 figures 
Access Paper or Ask Questions

toon2real: Translating Cartoon Images to Realistic Images

Feb 01, 2021
K. M. Arefeen Sultan, Mohammad Imrul Jubair, MD. Nahidul Islam, Sayed Hossain Khan

In terms of Image-to-image translation, Generative Adversarial Networks (GANs) has achieved great success even when it is used in the unsupervised dataset. In this work, we aim to translate cartoon images to photo-realistic images using GAN. We apply several state-of-the-art models to perform this task; however, they fail to perform good quality translations. We observe that the shallow difference between these two domains causes this issue. Based on this idea, we propose a method based on CycleGAN model for image translation from cartoon domain to photo-realistic domain. To make our model efficient, we implemented Spectral Normalization which added stability in our model. We demonstrate our experimental results and show that our proposed model has achieved the lowest Frechet Inception Distance score and better results compared to another state-of-the-art technique, UNIT.

* Accepted as a short paper at ICTAI 2020 
Access Paper or Ask Questions

Image-to-image translation for cross-domain disentanglement

Nov 04, 2018
Abel Gonzalez-Garcia, Joost van de Weijer, Yoshua Bengio

Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data distribution. There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control. In this paper, we bridge these two objectives and introduce the concept of cross-domain disentanglement. We aim to separate the internal representation into three parts. The shared part contains information for both domains. The exclusive parts, on the other hand, contain only factors of variation that are particular to each domain. We achieve this through bidirectional image translation based on Generative Adversarial Networks and cross-domain autoencoders, a novel network component. Our model offers multiple advantages. We can output diverse samples covering multiple modes of the distributions of both domains, perform domain-specific image transfer and interpolation, and cross-domain retrieval without the need of labeled data, only paired images. We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.

* Accepted to NIPS 2018 
Access Paper or Ask Questions