Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Deep Image Translation for Enhancing Simulated Ultrasound Images

Jun 18, 2020
Lin Zhang, Tiziano Portenier, Christoph Paulus, Orcun Goksel

Ultrasound simulation based on ray tracing enables the synthesis of highly realistic images. It can provide an interactive environment for training sonographers as an educational tool. However, due to high computational demand, there is a trade-off between image quality and interactivity, potentially leading to sub-optimal results at interactive rates. In this work we introduce a deep learning approach based on adversarial training that mitigates this trade-off by improving the quality of simulated images with constant computation time. An image-to-image translation framework is utilized to translate low quality images into high quality versions. To incorporate anatomical information potentially lost in low quality images, we additionally provide segmentation maps to image translation. Furthermore, we propose to leverage information from acoustic attenuation maps to better preserve acoustic shadows and directional artifacts, an invaluable feature for ultrasound image interpretation. The proposed method yields an improvement of 7.2% in Fr\'{e}chet Inception Distance and 8.9% in patch-based Kullback-Leibler divergence.


Toward Learning a Unified Many-to-Many Mapping for Diverse Image Translation

May 21, 2019
Wenju Xu, Shawn Keshmiri, Guanghui Wang

Image-to-image translation, which translates input images to a different domain with a learned one-to-one mapping, has achieved impressive success in recent years. The success of translation mainly relies on the network architecture to reserve the structural information while modify the appearance slightly at the pixel level through adversarial training. Although these networks are able to learn the mapping, the translated images are predictable without exclusion. It is more desirable to diversify them using image-to-image translation by introducing uncertainties, i.e., the generated images hold potential for variations in colors and textures in addition to the general similarity to the input images, and this happens in both the target and source domains. To this end, we propose a novel generative adversarial network (GAN) based model, InjectionGAN, to learn a many-to-many mapping. In this model, the input image is combined with latent variables, which comprise of domain-specific attribute and unspecific random variations. The domain-specific attribute indicates the target domain of the translation, while the unspecific random variations introduce uncertainty into the model. A unified framework is proposed to regroup these two parts and obtain diverse generations in each domain. Extensive experiments demonstrate that the diverse generations have high quality for the challenging image-to-image translation tasks where no pairing information of the training dataset exits. Both quantitative and qualitative results prove the superior performance of InjectionGAN over the state-of-the-art approaches.


Latent Filter Scaling for Multimodal Unsupervised Image-to-Image Translation

Dec 24, 2018
Yazeed Alharbi, Neil Smith, Peter Wonka

In multimodal unsupervised image-to-image translation tasks, the goal is to translate an image from the source domain to many images in the target domain. We present a simple method that produces higher quality images than current state-of-the-art while maintaining the same amount of multimodal diversity. Previous methods follow the unconditional approach of trying to map the latent code directly to a full-size image. This leads to complicated network architectures with several introduced hyperparameters to tune. By treating the latent code as a modifier of the convolutional filters, we produce multimodal output while maintaining the traditional Generative Adversarial Network (GAN) loss and without additional hyperparameters. The only tuning required by our method controls the tradeoff between variability and quality of generated images. Furthermore, we achieve disentanglement between source domain content and target domain style for free as a by-product of our formulation. We perform qualitative and quantitative experiments showing the advantages of our method compared with the state-of-the art on multiple benchmark image-to-image translation datasets.


UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translation

Mar 21, 2022
Dmitrii Torbunov, Yi Huang, Haiwang Yu, Jin Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren

Image-to-image translation has broad applications in art, design, and scientific simulations. The original CycleGAN model emphasizes one-to-one mapping via a cycle-consistent loss, while more recent works promote one-to-many mapping to boost the diversity of the translated images. With scientific simulation and one-to-one needs in mind, this work examines if equipping CycleGAN with a vision transformer (ViT) and employing advanced generative adversarial network (GAN) training techniques can achieve better performance. The resulting UNet ViT Cycle-consistent GAN (UVCGAN) model is compared with previous best-performing models on open benchmark image-to-image translation datasets, Selfie2Anime and CelebA. UVCGAN performs better and retains a strong correlation between the original and translated images. An accompanying ablation study shows that the gradient penalty and BERT-like pre-training also contribute to the improvement.~To promote reproducibility and open science, the source code, hyperparameter configurations, and pre-trained model will be made available at:

* 5 pages, 2 figures, 2 tables 

Image to Image Translation : Generating maps from satellite images

May 19, 2021
Vaishali Ingale, Rishabh Singh, Pragati Patwal

Generation of maps from satellite images is conventionally done by a range of tools. Maps became an important part of life whose conversion from satellite images may be a bit expensive but Generative models can pander to this challenge. These models aims at finding the patterns between the input and output image. Image to image translation is employed to convert satellite image to corresponding map. Different techniques for image to image translations like Generative adversarial network, Conditional adversarial networks and Co-Variational Auto encoders are used to generate the corresponding human-readable maps for that region, which takes a satellite image at a given zoom level as its input. We are training our model on Conditional Generative Adversarial Network which comprises of Generator model which which generates fake images while the discriminator tries to classify the image as real or fake and both these models are trained synchronously in adversarial manner where both try to fool each other and result in enhancing model performance.


Image Classification for Arabic: Assessing the Accuracy of Direct English to Arabic Translations

Jul 13, 2018
Abdulkareem Alsudais

Image classification is an ongoing research challenge. Most of the available research focuses on image classification for the English language, however there is very little research on image classification for the Arabic language. Expanding image classification to Arabic has several applications. The present study investigated a method for generating Arabic labels for images of objects. The method used in this study involved a direct English to Arabic translation of the labels that are currently available on ImageNet, a database commonly used in image classification research. The purpose of this study was to test the accuracy of this method. In this study, 2,887 labeled images were randomly selected from ImageNet. All of the labels were translated from English to Arabic using Google Translate. The accuracy of the translations was evaluated. Results indicated that that 65.6% of the Arabic labels were accurate. This study makes three important contributions to the image classification literature: (1) it determined the baseline level of accuracy for algorithms that provide Arabic labels for images, (2) it provided 1,895 images that are tagged with accurate Arabic labels, and (3) provided the accuracy of translations of image labels from English to Arabic.


Multi-Mapping Image-to-Image Translation with Central Biasing Normalization

Oct 11, 2018
Xiaoming Yu, Zhenqiang Ying, Thomas Li, Shan Liu, Ge Li

Image-to-image translation is a class of image processing and vision problems that translates an image to a different style or domain. To improve the capacity and performance of one-to-one translation models, multi-mapping image translation have been attempting to extend them for multiple mappings by injecting latent code. Through the analysis of the existing latent code injection models, we find that latent code can determine the target mapping of a generator by controlling the output statistical properties, especially the mean value. However, we find that in some cases the normalization will reduce the consistency of same mapping or the diversity of different mappings. After mathematical analysis, we find the reason behind that is that the distributions of same mapping become inconsistent after batch normalization, and that the effects of latent code are eliminated after instance normalization. To solve these problems, we propose consistency within diversity design criteria for multi-mapping networks. Based on the design criteria, we propose central biasing normalization (CBN) to replace existing latent code injection. CBN can be easily integrated into existing multi-mapping models, significantly reducing model parameters. Experiments show that the results of our method is more stable and diverse than that of existing models. .


Conditional Invertible Neural Networks for Diverse Image-to-Image Translation

May 05, 2021
Lynton Ardizzone, Jakob Kruse, Carsten Lüth, Niels Bracher, Carsten Rother, Ullrich Köthe

We introduce a new architecture called a conditional invertible neural network (cINN), and use it to address the task of diverse image-to-image translation for natural images. This is not easily possible with existing INN models due to some fundamental limitations. The cINN combines the purely generative INN model with an unconstrained feed-forward network, which efficiently preprocesses the conditioning image into maximally informative features. All parameters of a cINN are jointly optimized with a stable, maximum likelihood-based training procedure. Even though INN-based models have received far less attention in the literature than GANs, they have been shown to have some remarkable properties absent in GANs, e.g. apparent immunity to mode collapse. We find that our cINNs leverage these properties for image-to-image translation, demonstrated on day to night translation and image colorization. Furthermore, we take advantage of our bidirectional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way.

* arXiv admin note: text overlap with arXiv:1907.02392 

TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets

May 14, 2021
Yaxing Wang, Hector Laria Mantecon, Joost van de Weijer, Laura Lopez-Fuentes, Bogdan Raducanu

Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require the learning of millions of parameters from scratch. This drawback severely limits its application on small domains. In this paper, we propose a new transfer learning for I2I translation (TransferI2I). We decouple our learning process into the image generation step and the I2I translation step. In the first step we propose two novel techniques: source-target initialization and self-initialization of the adaptor layer. The former finetunes the pretrained generative model (e.g., StyleGAN) on source and target data. The latter allows to initialize all non-pretrained network parameters without the need of any data. These techniques provide a better initialization for the I2I translation step. In addition, we introduce an auxiliary GAN that further facilitates the training of deep I2I systems even from small datasets. In extensive experiments on three datasets, (Animal faces, Birds, and Foods), we show that we outperform existing methods and that mFID improves on several datasets with over 25 points.

* Technical report