Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

OmiTrans: generative adversarial networks based omics-to-omics translation framework

Nov 27, 2021
Xiaoyu Zhang, Yike Guo

With the rapid development of high-throughput experimental technologies, different types of omics (e.g., genomics, epigenomics, transcriptomics, proteomics, and metabolomics) data can be produced from clinical samples. The correlations between different omics types attracts a lot of research interest, whereas the stduy on genome-wide omcis data translation (i.e, generation and prediction of one type of omics data from another type of omics data) is almost blank. Generative adversarial networks and the variants are one of the most state-of-the-art deep learning technologies, which have shown great success in image-to-image translation, text-to-image translation, etc. Here we proposed OmiTrans, a deep learning framework adopted the idea of generative adversarial networks to achieve omics-to-omics translation with promising results. OmiTrans was able to faithfully reconstruct gene expression profiles from DNA methylation data with high accuracy and great model generalisation, as demonstrated in the experiments.

* 9 pages, 9 figures 
Access Paper or Ask Questions

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

Sep 15, 2017
Matan Sela, Elad Richardson, Ron Kimmel

It has been recently shown that neural networks can recover the geometric structure of a face from a single given image. A common denominator of most existing face geometry reconstruction methods is the restriction of the solution space to some low-dimensional subspace. While such a model significantly simplifies the reconstruction problem, it is inherently limited in its expressiveness. As an alternative, we propose an Image-to-Image translation network that jointly maps the input image to a depth image and a facial correspondence map. This explicit pixel-based mapping can then be utilized to provide high quality reconstructions of diverse faces under extreme expressions, using a purely geometric refinement process. In the spirit of recent approaches, the network is trained only with synthetic data, and is then evaluated on in-the-wild facial images. Both qualitative and quantitative analyses demonstrate the accuracy and the robustness of our approach.

* To appear in ICCV 2017 
Access Paper or Ask Questions

UMONS Submission for WMT18 Multimodal Translation Task

Oct 15, 2018
Jean-Benoit Delbrouck, Stéphane Dupont

This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18). We explore a novel architecture, called deepGRU, based on recent findings in the related task of Neural Image Captioning (NIC). The models presented in the following sections lead to the best METEOR translation score for both constrained (English, image) -> German and (English, image) -> French sub-tasks.

Access Paper or Ask Questions

ComboGAN: Unrestrained Scalability for Image Domain Translation

Dec 19, 2017
Asha Anoosheh, Eirikur Agustsson, Radu Timofte, Luc Van Gool

This year alone has seen unprecedented leaps in the area of learning-based image translation, namely CycleGAN, by Zhu et al. But experiments so far have been tailored to merely two domains at a time, and scaling them to more would require an quadratic number of models to be trained. And with two-domain models taking days to train on current hardware, the number of domains quickly becomes limited by the time and resources required to process them. In this paper, we propose a multi-component image translation model and training scheme which scales linearly - both in resource consumption and time required - with the number of domains. We demonstrate its capabilities on a dataset of paintings by 14 different artists and on images of the four different seasons in the Alps. Note that 14 data groups would need (14 choose 2) = 91 different CycleGAN models: a total of 182 generator/discriminator pairs; whereas our model requires only 14 generator/discriminator pairs.

* Source code provided here: 
Access Paper or Ask Questions

Thermal to Visible Image Synthesis under Atmospheric Turbulence

Apr 06, 2022
Kangfu Mei, Yiqun Mei, Vishal M. Patel

In many practical applications of long-range imaging such as biometrics and surveillance, thermal imagining modalities are often used to capture images in low-light and nighttime conditions. However, such imaging systems often suffer from atmospheric turbulence, which introduces severe blur and deformation artifacts to the captured images. Such an issue is unavoidable in long-range imaging and significantly decreases the face verification accuracy. In this paper, we first investigate the problem with a turbulence simulation method on real-world thermal images. An end-to-end reconstruction method is then proposed which can directly transform thermal images into visible-spectrum images by utilizing natural image priors based on a pre-trained StyleGAN2 network. Compared with the existing two-steps methods of consecutive turbulence mitigation and thermal to visible image translation, our method is demonstrated to be effective in terms of both the visual quality of the reconstructed results and face verification accuracy. Moreover, to the best of our knowledge, this is the first work that studies the problem of thermal to visible image translation under atmospheric turbulence.

* 4 pages, 3 figures 
Access Paper or Ask Questions

Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

Jul 14, 2021
Lalith Sharan, Gabriele Romano, Sven Koehler, Halvar Kelm, Matthias Karck, Raffaele De Simone, Sandy Engelhardt

The CycleGAN framework allows for unsupervised image-to-image translation of unpaired data. In a scenario of surgical training on a physical surgical simulator, this method can be used to transform endoscopic images of phantoms into images which more closely resemble the intra-operative appearance of the same surgical target structure. This can be viewed as a novel augmented reality approach, which we coined Hyperrealism in previous work. In this use case, it is of paramount importance to display objects like needles, sutures or instruments consistent in both domains while altering the style to a more tissue-like appearance. Segmentation of these objects would allow for a direct transfer, however, contouring of these, partly tiny and thin foreground objects is cumbersome and perhaps inaccurate. Instead, we propose to use landmark detection on the points when sutures pass into the tissue. This objective is directly incorporated into a CycleGAN framework by treating the performance of pre-trained detector models as an additional optimization goal. We show that a task defined on these sparse landmark labels improves consistency of synthesis by the generator network in both domains. Comparing a baseline CycleGAN architecture to our proposed extension (DetCycleGAN), mean precision (PPV) improved by +61.32, mean sensitivity (TPR) by +37.91, and mean F1 score by +0.4743. Furthermore, it could be shown that by dataset fusion, generated intra-operative images can be leveraged as additional training data for the detection network itself. The data is released within the scope of the AdaptOR MICCAI Challenge 2021 at, and code at

* Submitted to IEEE JBHI 2021, 13 pages, 8 figures, 4 tables 
Access Paper or Ask Questions

Image-to-Image Translation with Conditional Adversarial Networks

Nov 22, 2017
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros

We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

* CVPR 2017 
* Website: 
Access Paper or Ask Questions

Estimating Image Depth in the Comics Domain

Oct 07, 2021
Deblina Bhattacharjee, Martin Everaert, Mathieu Salzmann, Sabine Süsstrunk

Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth. This lets us leverage the depth annotations of existing natural images to train the depth estimator. Furthermore, our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates. Our method consistently outperforms the existing state-ofthe-art approaches across all metrics on both the DCM and eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction on comics.

* WACV 2022 : Winter Conference on Applications of Computer Vision 
Access Paper or Ask Questions

USIS: Unsupervised Semantic Image Synthesis

Sep 29, 2021
George Eskandar, Mohamed Abdelsamad, Karim Armanious, Bin Yang

Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a photorealistic image is synthesized from a segmentation mask. SIS has mostly been addressed as a supervised problem. However, state-of-the-art methods depend on a huge amount of labeled data and cannot be applied in an unpaired setting. On the other hand, generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and feed them to traditional convolutional networks, which then learn correspondences in appearance instead of semantic content. In this initial work, we propose a new Unsupervised paradigm for Semantic Image Synthesis (USIS) as a first step towards closing the performance gap between paired and unpaired settings. Notably, the framework deploys a SPADE generator that learns to output images with visually separable semantic classes using a self-supervised segmentation loss. Furthermore, in order to match the color and texture distribution of real images without losing high-frequency information, we propose to use whole image wavelet-based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to generate multimodal photorealistic images with an improved quality in the unpaired setting.

Access Paper or Ask Questions

Image Generation from Sketch Constraint Using Contextual GAN

Jul 26, 2018
Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

In this paper we investigate image generation guided by hand sketch. When the input sketch is badly drawn, the output of common image-to-image translation follows the input edges due to the hard condition imposed by the translation process. Instead, we propose to use sketch as weak constraint, where the output edges do not necessarily follow the input edges. We address this problem using a novel joint image completion approach, where the sketch provides the image context for completing, or generating the output image. We train a generated adversarial network, i.e, contextual GAN to learn the joint distribution of sketch and the corresponding image by using joint images. Our contextual GAN has several advantages. First, the simple joint image representation allows for simple and effective learning of joint distribution in the same image-sketch space, which avoids complicated issues in cross-domain learning. Second, while the output is related to its input overall, the generated features exhibit more freedom in appearance and do not strictly align with the input features as previous conditional GANs do. Third, from the joint image's point of view, image and sketch are of no difference, thus exactly the same deep joint image completion network can be used for image-to-sketch generation. Experiments evaluated on three different datasets show that our contextual GAN can generate more realistic images than state-of-the-art conditional GANs on challenging inputs and generalize well on common categories.

* ECCV 2018 
Access Paper or Ask Questions