Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Federated CycleGAN for Privacy-Preserving Image-to-Image Translation

Jun 17, 2021
Joonyoung Song, Jong Chul Ye

Unsupervised image-to-image translation methods such as CycleGAN learn to convert images from one domain to another using unpaired training data sets from different domains. Unfortunately, these approaches still require centrally collected unpaired records, potentially violating privacy and security issues. Although the recent federated learning (FL) allows a neural network to be trained without data exchange, the basic assumption of the FL is that all clients have their own training data from a similar domain, which is different from our image-to-image translation scenario in which each client has images from its unique domain and the goal is to learn image translation between different domains without accessing the target domain data. To address this, here we propose a novel federated CycleGAN architecture that can learn image translation in an unsupervised manner while maintaining the data privacy. Specifically, our approach arises from a novel observation that CycleGAN loss can be decomposed into the sum of client specific local objectives that can be evaluated using only their data. This local objective decomposition allows multiple clients to participate in federated CycleGAN training without sacrificing performance. Furthermore, our method employs novel switchable generator and discriminator architecture using Adaptive Instance Normalization (AdaIN) that significantly reduces the band-width requirement of the federated learning. Our experimental results on various unsupervised image translation tasks show that our federated CycleGAN provides comparable performance compared to the non-federated counterpart.

Access Paper or Ask Questions

An Efficient Multi-Domain Framework for Image-to-Image Translation

Nov 28, 2019
Ye Lin, Keren Fu, Shenggui Ling, Cheng Peng

Existing approaches have been proposed to tackle unsupervised image-to-image translation in recent years. However, they mainly focus on one-to-one mappings, making it difficult to handle more general and practical problems such as multi-domain translations. To address issues like large cost of training time and resources in translation between any number of domains, we propose a general framework called multi-domain translator (MDT), which is extended from bi-directional image-to-image translation. MDT is designed to have only one domain-shared encoder for the consideration of efficiency, together with several domain-specified decoders to transform an image into multiple domains without knowing the input domain label. Moreover, we propose to employ two constraints, namely reconstruction loss and identity loss to further improve the generation. Experiments are conducted on different databases for several multi-domain translation tasks. Both qualitative and quantitative results demonstrate the effectiveness and efficiency performed by the proposed MDT against the state-of-the-art models.

* 18pages, 14 figures, 3 tables. Submitted to PATTERN RECOGN 
Access Paper or Ask Questions

Unsupervised Multi-Domain Multimodal Image-to-Image Translation with Explicit Domain-Constrained Disentanglement

Nov 02, 2019
Weihao Xia, Yujiu Yang, Jing-Hao Xue

Image-to-image translation has drawn great attention during the past few years. It aims to translate an image in one domain to a given reference image in another domain. Due to its effectiveness and efficiency, many applications can be formulated as image-to-image translation problems. However, three main challenges remain in image-to-image translation: 1) the lack of large amounts of aligned training pairs for different tasks; 2) the ambiguity of multiple possible outputs from a single input image; and 3) the lack of simultaneous training of multiple datasets from different domains within a single network. We also found in experiments that the implicit disentanglement of content and style could lead to unexpect results. In this paper, we propose a unified framework for learning to generate diverse outputs using unpaired training data and allow simultaneous training of multiple datasets from different domains via a single network. Furthermore, we also investigate how to better extract domain supervision information so as to learn better disentangled representations and achieve better image translation. Experiments show that the proposed method outperforms or is comparable with the state-of-the-art methods.

* 20 pages 
Access Paper or Ask Questions

LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images

Mar 10, 2020
Shan Lin, Fangbo Qin, Yangming Li, Randall A. Bly, Kris S. Moe, Blake Hannaford

The intelligent perception of endoscopic vision is appealing in many computer-assisted and robotic surgeries. Achieving good vision-based analysis with deep learning techniques requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. When applying a trained model to a different but relevant dataset, a new labeled dataset may be required for training to avoid performance degradation. In this work, we investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model called live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs). More specifically, we consider a situation when a labeled cadaveric surgery dataset is available while the task is instrument segmentation on a live surgery dataset. We train LC-GAN to learn the mappings between the cadaveric and live datasets. To achieve instrument segmentation on live images, we can first translate the live images to fake-cadaveric images with LC-GAN, and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset. With this cross-domain strategy, we fully leverage the labeled cadaveric dataset for segmentation on live images without the need to label the live dataset again. Two generators with different architectures are designed for LC-GAN to make use of the deep feature representation learned from the cadaveric image based instrument segmentation task. Moreover, we propose structural similarity loss and segmentation consistency loss to improve the semantic consistency during translation. The results demonstrate that LC-GAN achieves better image-to-image translation results, and leads to improved segmentation performance in the proposed cross-domain segmentation task.

* Submitted to 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 
Access Paper or Ask Questions

MCMI: Multi-Cycle Image Translation with Mutual Information Constraints

Jul 06, 2020
Xiang Xu, Megha Nawhal, Greg Mori, Manolis Savva

We present a mutual information-based framework for unsupervised image-to-image translation. Our MCMI approach treats single-cycle image translation models as modules that can be used recurrently in a multi-cycle translation setting where the translation process is bounded by mutual information constraints between the input and output images. The proposed mutual information constraints can improve cross-domain mappings by optimizing out translation functions that fail to satisfy the Markov property during image translations. We show that models trained with MCMI produce higher quality images and learn more semantically-relevant mappings compared to state-of-the-art image translation methods. The MCMI framework can be applied to existing unpaired image-to-image translation models with minimum modifications. Qualitative experiments and a perceptual study demonstrate the image quality improvements and generality of our approach using several backbone models and a variety of image datasets.

Access Paper or Ask Questions

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Dec 01, 2021
Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Access Paper or Ask Questions

Instance-wise Hard Negative Example Generation for Contrastive Learning in Unpaired Image-to-Image Translation

Aug 11, 2021
Weilun Wang, Wengang Zhou, Jianmin Bao, Dong Chen, Houqiang Li

Contrastive learning shows great potential in unpaired image-to-image translation, but sometimes the translated results are in poor quality and the contents are not preserved consistently. In this paper, we uncover that the negative examples play a critical role in the performance of contrastive learning for image translation. The negative examples in previous methods are randomly sampled from the patches of different positions in the source image, which are not effective to push the positive examples close to the query examples. To address this issue, we present instance-wise hard Negative Example Generation for Contrastive learning in Unpaired image-to-image Translation (NEGCUT). Specifically, we train a generator to produce negative examples online. The generator is novel from two perspectives: 1) it is instance-wise which means that the generated examples are based on the input image, and 2) it can generate hard negative examples since it is trained with an adversarial loss. With the generator, the performance of unpaired image-to-image translation is significantly improved. Experiments on three benchmark datasets demonstrate that the proposed NEGCUT framework achieves state-of-the-art performance compared to previous methods.

* Accepted by ICCV 2021 
Access Paper or Ask Questions

Towards Instance-level Image-to-Image Translation

May 05, 2019
Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, Thomas Huang

Unpaired Image-to-image Translation is a new rising and challenging vision problem that aims to learn a mapping between unaligned image pairs in diverse domains. Recent advances in this field like MUNIT and DRIT mainly focus on disentangling content and style/attribute from a given image first, then directly adopting the global style to guide the model to synthesize new domain images. However, this kind of approaches severely incurs contradiction if the target domain images are content-rich with multiple discrepant objects. In this paper, we present a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially. The proposed INIT exhibits three import advantages: (1) the instance-level objective loss can help learn a more accurate reconstruction and incorporate diverse attributes of objects; (2) the styles used for target domain of local/global areas are from corresponding spatial regions in source domain, which intuitively is a more reasonable mapping; (3) the joint training process can benefit both fine and coarse granularity and incorporates instance information to improve the quality of global translation. We also collect a large-scale benchmark for the new instance-level translation task. We observe that our synthetic images can even benefit real-world vision tasks like generic object detection.

* Accepted to CVPR 2019. Project page: 
Access Paper or Ask Questions

Unsupervised Multi-Modal Medical Image Registration via Discriminator-Free Image-to-Image Translation

Apr 28, 2022
Zekang Chen, Jia Wei, Rui Li

In clinical practice, well-aligned multi-modal images, such as Magnetic Resonance (MR) and Computed Tomography (CT), together can provide complementary information for image-guided therapies. Multi-modal image registration is essential for the accurate alignment of these multi-modal images. However, it remains a very challenging task due to complicated and unknown spatial correspondence between different modalities. In this paper, we propose a novel translation-based unsupervised deformable image registration approach to convert the multi-modal registration problem to a mono-modal one. Specifically, our approach incorporates a discriminator-free translation network to facilitate the training of the registration network and a patchwise contrastive loss to encourage the translation network to preserve object shapes. Furthermore, we propose to replace an adversarial loss, that is widely used in previous multi-modal image registration methods, with a pixel loss in order to integrate the output of translation into the target modality. This leads to an unsupervised method requiring no ground-truth deformation or pairs of aligned images for training. We evaluate four variants of our approach on the public Learn2Reg 2021 datasets \cite{hering2021learn2reg}. The experimental results demonstrate that the proposed architecture achieves state-of-the-art performance. Our code is available at

* Accepted in IJCAI 2022 
Access Paper or Ask Questions

Flow-based Deformation Guidance for Unpaired Multi-Contrast MRI Image-to-Image Translation

Dec 03, 2020
Toan Duc Bui, Manh Nguyen, Ngan Le, Khoa Luu

Image synthesis from corrupted contrasts increases the diversity of diagnostic information available for many neurological diseases. Recently the image-to-image translation has experienced significant levels of interest within medical research, beginning with the successful use of the Generative Adversarial Network (GAN) to the introduction of cyclic constraint extended to multiple domains. However, in current approaches, there is no guarantee that the mapping between the two image domains would be unique or one-to-one. In this paper, we introduce a novel approach to unpaired image-to-image translation based on the invertible architecture. The invertible property of the flow-based architecture assures a cycle-consistency of image-to-image translation without additional loss functions. We utilize the temporal information between consecutive slices to provide more constraints to the optimization for transforming one domain to another in unpaired volumetric medical images. To capture temporal structures in the medical images, we explore the displacement between the consecutive slices using a deformation field. In our approach, the deformation field is used as a guidance to keep the translated slides realistic and consistent across the translation. The experimental results have shown that the synthesized images using our proposed approach are able to archive a competitive performance in terms of mean squared error, peak signal-to-noise ratio, and structural similarity index when compared with the existing deep learning-based methods on three standard datasets, i.e. HCP, MRBrainS13, and Brats2019.

* Medical Image Computing and Computer Assisted Interventions 
Access Paper or Ask Questions