Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Latent Filter Scaling for Multimodal Unsupervised Image-to-Image Translation

Dec 24, 2018
Yazeed Alharbi, Neil Smith, Peter Wonka

In multimodal unsupervised image-to-image translation tasks, the goal is to translate an image from the source domain to many images in the target domain. We present a simple method that produces higher quality images than current state-of-the-art while maintaining the same amount of multimodal diversity. Previous methods follow the unconditional approach of trying to map the latent code directly to a full-size image. This leads to complicated network architectures with several introduced hyperparameters to tune. By treating the latent code as a modifier of the convolutional filters, we produce multimodal output while maintaining the traditional Generative Adversarial Network (GAN) loss and without additional hyperparameters. The only tuning required by our method controls the tradeoff between variability and quality of generated images. Furthermore, we achieve disentanglement between source domain content and target domain style for free as a by-product of our formulation. We perform qualitative and quantitative experiments showing the advantages of our method compared with the state-of-the art on multiple benchmark image-to-image translation datasets.

Access Paper or Ask Questions

UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation

Mar 21, 2022
Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren

Image-to-image translation has broad applications in art, design, and scientific simulations. The original CycleGAN model emphasizes one-to-one mapping via a cycle-consistent loss, while more recent works promote one-to-many mapping to boost the diversity of the translated images. With scientific simulation and one-to-one needs in mind, this work examines if equipping CycleGAN with a vision transformer (ViT) and employing advanced generative adversarial network (GAN) training techniques can achieve better performance. The resulting UNet ViT Cycle-consistent GAN (UVCGAN) model is compared with previous best-performing models on open benchmark image-to-image translation datasets, Selfie2Anime and CelebA. UVCGAN performs better and retains a strong correlation between the original and translated images. An accompanying ablation study shows that the gradient penalty and BERT-like pre-training also contribute to the improvement.~To promote reproducibility and open science, the source code, hyperparameter configurations, and pre-trained model will be made available at:

* 5 pages, 2 figures, 2 tables 
Access Paper or Ask Questions

Image to Image Translation : Generating maps from satellite images

May 19, 2021
Vaishali Ingale, Rishabh Singh, Pragati Patwal

Generation of maps from satellite images is conventionally done by a range of tools. Maps became an important part of life whose conversion from satellite images may be a bit expensive but Generative models can pander to this challenge. These models aims at finding the patterns between the input and output image. Image to image translation is employed to convert satellite image to corresponding map. Different techniques for image to image translations like Generative adversarial network, Conditional adversarial networks and Co-Variational Auto encoders are used to generate the corresponding human-readable maps for that region, which takes a satellite image at a given zoom level as its input. We are training our model on Conditional Generative Adversarial Network which comprises of Generator model which which generates fake images while the discriminator tries to classify the image as real or fake and both these models are trained synchronously in adversarial manner where both try to fool each other and result in enhancing model performance.

Access Paper or Ask Questions

Image Classification for Arabic: Assessing the Accuracy of Direct English to Arabic Translations

Jul 13, 2018
Abdulkareem Alsudais

Image classification is an ongoing research challenge. Most of the available research focuses on image classification for the English language, however there is very little research on image classification for the Arabic language. Expanding image classification to Arabic has several applications. The present study investigated a method for generating Arabic labels for images of objects. The method used in this study involved a direct English to Arabic translation of the labels that are currently available on ImageNet, a database commonly used in image classification research. The purpose of this study was to test the accuracy of this method. In this study, 2,887 labeled images were randomly selected from ImageNet. All of the labels were translated from English to Arabic using Google Translate. The accuracy of the translations was evaluated. Results indicated that that 65.6% of the Arabic labels were accurate. This study makes three important contributions to the image classification literature: (1) it determined the baseline level of accuracy for algorithms that provide Arabic labels for images, (2) it provided 1,895 images that are tagged with accurate Arabic labels, and (3) provided the accuracy of translations of image labels from English to Arabic.

Access Paper or Ask Questions

Multi-Mapping Image-to-Image Translation with Central Biasing Normalization

Oct 11, 2018
Xiaoming Yu, Zhenqiang Ying, Thomas Li, Shan Liu, Ge Li

Image-to-image translation is a class of image processing and vision problems that translates an image to a different style or domain. To improve the capacity and performance of one-to-one translation models, multi-mapping image translation have been attempting to extend them for multiple mappings by injecting latent code. Through the analysis of the existing latent code injection models, we find that latent code can determine the target mapping of a generator by controlling the output statistical properties, especially the mean value. However, we find that in some cases the normalization will reduce the consistency of same mapping or the diversity of different mappings. After mathematical analysis, we find the reason behind that is that the distributions of same mapping become inconsistent after batch normalization, and that the effects of latent code are eliminated after instance normalization. To solve these problems, we propose consistency within diversity design criteria for multi-mapping networks. Based on the design criteria, we propose central biasing normalization (CBN) to replace existing latent code injection. CBN can be easily integrated into existing multi-mapping models, significantly reducing model parameters. Experiments show that the results of our method is more stable and diverse than that of existing models. .

Access Paper or Ask Questions

Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

May 05, 2021
Lynton Ardizzone, Jakob Kruse, Carsten Lüth, Niels Bracher, Carsten Rother, Ullrich Köthe

We introduce a new architecture called a conditional invertible neural network (cINN), and use it to address the task of diverse image-to-image translation for natural images. This is not easily possible with existing INN models due to some fundamental limitations. The cINN combines the purely generative INN model with an unconstrained feed-forward network, which efficiently preprocesses the conditioning image into maximally informative features. All parameters of a cINN are jointly optimized with a stable, maximum likelihood-based training procedure. Even though INN-based models have received far less attention in the literature than GANs, they have been shown to have some remarkable properties absent in GANs, e.g. apparent immunity to mode collapse. We find that our cINNs leverage these properties for image-to-image translation, demonstrated on day to night translation and image colorization. Furthermore, we take advantage of our bidirectional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way.

* arXiv admin note: text overlap with arXiv:1907.02392 
Access Paper or Ask Questions

TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets

May 14, 2021
Yaxing Wang, Hector Laria Mantecon, Joost van de Weijer, Laura Lopez-Fuentes, Bogdan Raducanu

Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require the learning of millions of parameters from scratch. This drawback severely limits its application on small domains. In this paper, we propose a new transfer learning for I2I translation (TransferI2I). We decouple our learning process into the image generation step and the I2I translation step. In the first step we propose two novel techniques: source-target initialization and self-initialization of the adaptor layer. The former finetunes the pretrained generative model (e.g., StyleGAN) on source and target data. The latter allows to initialize all non-pretrained network parameters without the need of any data. These techniques provide a better initialization for the I2I translation step. In addition, we introduce an auxiliary GAN that further facilitates the training of deep I2I systems even from small datasets. In extensive experiments on three datasets, (Animal faces, Birds, and Foods), we show that we outperform existing methods and that mFID improves on several datasets with over 25 points.

* Technical report 
Access Paper or Ask Questions

Segmentation Guided Image-to-Image Translation with Adversarial Networks

Jan 06, 2019
Songyao Jiang, Zhiqiang Tao, Yun Fu

Recently image-to-image translation has received increasing attention, which aims to map images in one domain to another specific one. Existing methods mainly solve this task via a deep generative model, and focus on exploring the relationship between different domains. However, these methods neglect to utilize higher-level and instance-specific information to guide the training process, leading to a great deal of unrealistic generated images of low quality. Existing methods also lack of spatial controllability during translation. To address these challenge, we propose a novel Segmentation Guided Generative Adversarial Networks (SGGAN), which leverages semantic segmentation to further boost the generation performance and provide spatial mapping. In particular, a segmentor network is designed to impose semantic information on the generated images. Experimental results on multi-domain face image translation task empirically demonstrate our ability of the spatial modification and our superiority in image quality over several state-of-the-art methods.

* Submitted to an IEEE conference 
Access Paper or Ask Questions