Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Towards Multimodal Simultaneous Neural Machine Translation

Apr 07, 2020
Aizhan Imankulova, Masahiro Kaneko, Tosho Hirasawa, Mamoru Komachi

Simultaneous translation involves translating a sentence before the speaker's utterance is completed in order to realize real-time understanding in multiple languages. This task is significantly harder than the general full sentence translation because of the shortage of input information during decoding. To alleviate this shortage, we propose multimodal simultaneous neural machine translation (MSNMT) which leverages visual information as an additional modality. Although the usefulness of images as an additional modality is moderate for full sentence translation, we verified, for the first time, its importance for simultaneous translation. Our experiments with the Multi30k dataset showed that MSNMT in a simultaneous setting significantly outperforms its text-only counterpart in situations where 5 or fewer input tokens are needed to begin translation. We then verified the importance of visual information during decoding by (a) performing an adversarial evaluation of MSNMT where we studied how models behave with incongruent input modality and (b) analyzing the image attention.

Access Paper or Ask Questions

Dual Path Learning for Domain Adaptation of Semantic Segmentation

Aug 13, 2021
Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Fang Wen, Wenqiang Zhang

Domain adaptation for semantic segmentation enables to alleviate the need for large-scale pixel-wise annotations. Recently, self-supervised learning (SSL) with a combination of image-to-image translation shows great effectiveness in adaptive segmentation. The most common practice is to perform SSL along with image translation to well align a single domain (the source or target). However, in this single-domain paradigm, unavoidable visual inconsistency raised by image translation may affect subsequent learning. In this paper, based on the observation that domain adaptation frameworks performed in the source and target domain are almost complementary in terms of image translation and SSL, we propose a novel dual path learning (DPL) framework to alleviate visual inconsistency. Concretely, DPL contains two complementary and interactive single-domain adaptation pipelines aligned in source and target domain respectively. The inference of DPL is extremely simple, only one segmentation model in the target domain is employed. Novel technologies such as dual path image translation and dual path adaptive segmentation are proposed to make two paths promote each other in an interactive manner. Experiments on GTA5$\rightarrow$Cityscapes and SYNTHIA$\rightarrow$Cityscapes scenarios demonstrate the superiority of our DPL model over the state-of-the-art methods. The code and models are available at: \url{}

* Accepted by ICCV 2021 
Access Paper or Ask Questions

Image Embedded Segmentation: Combining Supervised and Unsupervised Objectives through Generative Adversarial Networks

Jan 30, 2020
C. T. Sari, G. N. Gunesli, C. Sokmensuer, C. Gunduz-Demir

This paper presents a new regularization method to train a fully convolutional network for semantic tissue segmentation in histopathological images. This method relies on benefiting unsupervised learning, in the form of image reconstruction, for the network training. To this end, it puts forward an idea of defining a new embedding that allows uniting the main supervised task of semantic segmentation and an auxiliary unsupervised task of image reconstruction into a single task and proposes to learn this united task by a single generative model. This embedding generates a multi-channel output image by superimposing an original input image on its segmentation map. Then, the method learns to translate the input image to this embedded output image using a conditional generative adversarial network, which is known to be quite effective for image-to-image translations. This proposal is different than the existing approach that uses image reconstruction for the same regularization purpose. The existing approach considers segmentation and image reconstruction as two separate tasks in a multi-task network, defines their losses independently, and then combines these losses in a joint loss function. However, the definition of such a function requires externally determining the right contribution amounts of the supervised and unsupervised losses that yield balanced learning between the segmentation and image reconstruction tasks. The proposed approach eliminates this difficulty by uniting these two tasks into a single one, which intrinsically combines their losses. Using histopathological image segmentation as a showcase application, our experiments demonstrate that this proposed approach leads to better segmentation results.

* This work has been submitted to the IEEE for possible publication 
Access Paper or Ask Questions

Photo-to-Shape Material Transfer for Diverse Structures

May 09, 2022
Ruizhen Hu, Xiangyu Su, Xiangkai Chen, Oliver Van Kaick, Hui Huang

We introduce a method for assigning photorealistic relightable materials to 3D shapes in an automatic manner. Our method takes as input a photo exemplar of a real object and a 3D object with segmentation, and uses the exemplar to guide the assignment of materials to the parts of the shape, so that the appearance of the resulting shape is as similar as possible to the exemplar. To accomplish this goal, our method combines an image translation neural network with a material assignment neural network. The image translation network translates the color from the exemplar to a projection of the 3D shape and the part segmentation from the projection to the exemplar. Then, the material prediction network assigns materials from a collection of realistic materials to the projected parts, based on the translated images and perceptual similarity of the materials. One key idea of our method is to use the translation network to establish a correspondence between the exemplar and shape projection, which allows us to transfer materials between objects with diverse structures. Another key idea of our method is to use the two pairs of (color, segmentation) images provided by the image translation to guide the material assignment, which enables us to ensure the consistency in the assignment. We demonstrate that our method allows us to assign materials to shapes so that their appearances better resemble the input exemplars, improving the quality of the results over the state-of-the-art method, and allowing us to automatically create thousands of shapes with high-quality photorealistic materials. Code and data for this paper are available at

Access Paper or Ask Questions

Combining Noise-to-Image and Image-to-Image GANs: Brain MR Image Augmentation for Tumor Detection

May 31, 2019
Changhee Han, Leonardo Rundo, Ryosuke Araki, Yudai Nagano, Yujiro Furukawa, Giancarlo Mauri, Hideki Nakayama, Hideaki Hayashi

Convolutional Neural Networks (CNNs) can achieve excellent computer-assisted diagnosis performance, relying on sufficient annotated training data. Unfortunately, most medical imaging datasets, often collected from various scanners, are small and fragmented. In this context, as a Data Augmentation (DA) technique, Generative Adversarial Networks (GANs) can synthesize realistic/diverse additional training images to fill the data lack in the real image distribution; researchers have improved classification by augmenting images with noise-to-image (e.g., random noise samples to diverse pathological images) or image-to-image GANs (e.g., a benign image to a malignant one). Yet, no research has reported results combining (i) noise-to-image GANs and image-to-image GANs or (ii) GANs and other deep generative models, for further performance boost. Therefore, to maximize the DA effect with the GAN combinations, we propose a two-step GAN-based DA that generates and refines brain MR images with/without tumors separately: (i) Progressive Growing of GANs (PGGANs), multi-stage noise-to-image GAN for high-resolution image generation, first generates realistic/diverse 256 x 256 images--even a physician cannot accurately distinguish them from real ones via Visual Turing Test; (ii) UNsupervised Image-to-image Translation or SimGAN, image-to-image GAN combining GANs/Variational AutoEncoders or using a GAN loss for DA, further refines the texture/shape of the PGGAN-generated images similarly to the real ones. We thoroughly investigate CNN-based tumor classification results, also considering the influence of pre-training on ImageNet and discarding weird-looking GAN-generated images. The results show that, when combined with classic DA, our two-step GAN-based DA can significantly outperform the classic DA alone, in tumor detection (i.e., boosting sensitivity from 93.63% to 97.53%) and also in other tasks.

* 9 pages, 7 figures, submitted to IEEE ACCESS 
Access Paper or Ask Questions

Adapting to Unseen Vendor Domains for MRI Lesion Segmentation

Aug 14, 2021
Brandon Mac, Alan R. Moody, April Khademi

One of the key limitations in machine learning models is poor performance on data that is out of the domain of the training distribution. This is especially true for image analysis in magnetic resonance (MR) imaging, as variations in hardware and software create non-standard intensities, contrasts, and noise distributions across scanners. Recently, image translation models have been proposed to augment data across domains to create synthetic data points. In this paper, we investigate the application an unsupervised image translation model to augment MR images from a source dataset to a target dataset. Specifically, we want to evaluate how well these models can create synthetic data points representative of the target dataset through image translation, and to see if a segmentation model trained these synthetic data points would approach the performance of a model trained directly on the target dataset. We consider three configurations of augmentation between datasets consisting of translation between images, between scanner vendors, and from labels to images. It was found that the segmentation models trained on synthetic data from labels to images configuration yielded the closest performance to the segmentation model trained directly on the target dataset. The Dice coeffcient score per each target vendor (GE, Siemens, Philips) for training on synthetic data was 0.63, 0.64, and 0.58, compared to training directly on target dataset was 0.65, 0.72, and 0.61.

Access Paper or Ask Questions

Detecting GAN generated Fake Images using Co-occurrence Matrices

Mar 15, 2019
Lakshmanan Nataraj, Tajuddin Manhar Mohammed, B. S. Manjunath, Shivkumar Chandrasekaran, Arjuna Flenner, Jawadul H. Bappy, Amit K. Roy-Chowdhury

The advent of Generative Adversarial Networks (GANs) has brought about completely novel ways of transforming and manipulating pixels in digital images. GAN based techniques such as Image-to-Image translations, DeepFakes, and other automated methods have become increasingly popular in creating fake images. In this paper, we propose a novel approach to detect GAN generated fake images using a combination of co-occurrence matrices and deep learning. We extract co-occurrence matrices on three color channels in the pixel domain and train a model using a deep convolutional neural network (CNN) framework. Experimental results on two diverse and challenging GAN datasets comprising more than 56,000 images based on unpaired image-to-image translations (cycleGAN [1]) and facial attributes/expressions (StarGAN [2]) show that our approach is promising and achieves more than 99% classification accuracy in both datasets. Further, our approach also generalizes well and achieves good results when trained on one dataset and tested on the other.

Access Paper or Ask Questions

C-MADA: Unsupervised Cross-Modality Adversarial Domain Adaptation framework for medical Image Segmentation

Oct 29, 2021
Maria Baldeon-Calisto, Susana K. Lai-Yuen

Deep learning models have obtained state-of-the-art results for medical image analysis. However, when these models are tested on an unseen domain there is a significant performance degradation. In this work, we present an unsupervised Cross-Modality Adversarial Domain Adaptation (C-MADA) framework for medical image segmentation. C-MADA implements an image- and feature-level adaptation method in a sequential manner. First, images from the source domain are translated to the target domain through an un-paired image-to-image adversarial translation with cycle-consistency loss. Then, a U-Net network is trained with the mapped source domain images and target domain images in an adversarial manner to learn domain-invariant feature representations. Furthermore, to improve the networks segmentation performance, information about the shape, texture, and con-tour of the predicted segmentation is included during the adversarial train-ing. C-MADA is tested on the task of brain MRI segmentation, obtaining competitive results.

* 5 pages, 1 figure 
Access Paper or Ask Questions

Exploring Unlabeled Faces for Novel Attribute Discovery

Dec 06, 2019
Hyojin Bahng, Sunghyo Chung, Seungjoo Yoo, Jaegul Choo

Despite remarkable success in unpaired image-to-image translation, existing systems still require a large amount of labeled images. This is a bottleneck for their real-world applications; in practice, a model trained on labeled CelebA dataset does not work well for test images from a different distribution -- greatly limiting their application to unlabeled images of a much larger quantity. In this paper, we attempt to alleviate this necessity for labeled data in the facial image translation domain. We aim to explore the degree to which you can discover novel attributes from unlabeled faces and perform high-quality translation. To this end, we use prior knowledge about the visual world as guidance to discover novel attributes and transfer them via a novel normalization method. Experiments show that our method trained on unlabeled data produces high-quality translations, preserves identity, and be perceptually realistic as good as, or better than, state-of-the-art methods trained on labeled data.

* 10 pages, 6 figures 
Access Paper or Ask Questions

Geometry-Consistent Adversarial Networks for One-Sided Unsupervised Domain Mapping

Sep 16, 2018
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Kun Zhang, Dacheng Tao

Unsupervised domain mapping aims at learning a function to translate domain X to Y (GXY : X to Y) in the absence of paired (X,Y) samples. Finding the optimal GXY without paired data is an ill-posed problem and hence appropriate constraints are required to obtain reasonable solutions. One of the most prominent constraint is cycle-consistency, which enforces the translated image by GXY to be translated back to the input image by an inverse mapping GYX. While cycle-consistency requires simultaneous training of GXY and GYX, recent methods have demonstrated one-sided domain mapping (only learn GXY) can be achieved by preserving pairwise distance between images before and after translation. Although cycle-consistency and distance preserving successfully constrain the solution space, they overlook the special properties of images that simple geometric transformations do not change the semantics of an image. Based on this special property, we develop a geometry-consistent adversarial network (GcGAN) which enables one-sided unsupervised domain mapping. Our GcGAN takes the original image and its counterpart image transformed by a predefined geometric transformation as inputs and generates two images in the new domain with the corresponding geometry-consistency constraint. The geometry-consistency constraint eliminates unreasonable solutions and produce more reliable solutions. Quantitative comparisons against baseline (GAN alone) and the state-of-the-art methods, including DistanceGAN and CycleGAN, demonstrate the superiority of our method in generating realistic images.

Access Paper or Ask Questions