Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image To Image Translation": models, code, and papers

DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

May 06, 2020
Mor Avi-Aharon, Assaf Arbelle, Tammy Riklin Raviv

Figure 1 for DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

Figure 2 for DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

Figure 3 for DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

Figure 4 for DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

We present the DeepHist - a novel Deep Learning framework for augmenting a network by histogram layers and demonstrate its strength by addressing image-to-image translation problems. Specifically, given an input image and a reference color distribution we aim to generate an output image with the structural appearance (content) of the input (source) yet with the colors of the reference. The key idea is a new technique for a differentiable construction of joint and color histograms of the output images. We further define a color distribution loss based on the Earth Mover's Distance between the output's and the reference's color histograms and a Mutual Information loss based on the joint histograms of the source and the output images. Promising results are shown for the tasks of color transfer, image colorization and edges $\rightarrow$ photo, where the color distribution of the output image is controlled. Comparison to Pix2Pix and CyclyGANs are shown.

* arXiv admin note: text overlap with arXiv:1912.06044

Via

Access Paper or Ask Questions

LSC-GAN: Latent Style Code Modeling for Continuous Image-to-image Translation

Oct 11, 2021
Qiusheng Huang, Xueqi Hu, Li Sun, Qingli Li

Figure 1 for LSC-GAN: Latent Style Code Modeling for Continuous Image-to-image Translation

Figure 2 for LSC-GAN: Latent Style Code Modeling for Continuous Image-to-image Translation

Figure 3 for LSC-GAN: Latent Style Code Modeling for Continuous Image-to-image Translation

Figure 4 for LSC-GAN: Latent Style Code Modeling for Continuous Image-to-image Translation

Image-to-image (I2I) translation is usually carried out among discrete domains. However, image domains, often corresponding to a physical value, are usually continuous. In other words, images gradually change with the value, and there exists no obvious gap between different domains. This paper intends to build the model for I2I translation among continuous varying domains. We first divide the whole domain coverage into discrete intervals, and explicitly model the latent style code for the center of each interval. To deal with continuous translation, we design the editing modules, changing the latent style code along two directions. These editing modules help to constrain the codes for domain centers during training, so that the model can better understand the relation among them. To have diverse results, the latent style code is further diversified with either the random noise or features from the reference image, giving the individual style code to the decoder for label-based or reference-based synthesis. Extensive experiments on age and viewing angle translation show that the proposed method can achieve high-quality results, and it is also flexible for users.

Via

Access Paper or Ask Questions

Anime-to-Real Clothing: Cosplay Costume Generation via Image-to-Image Translation

Aug 26, 2020
Koya Tango, Marie Katsurai, Hayato Maki, Ryosuke Goto

Figure 1 for Anime-to-Real Clothing: Cosplay Costume Generation via Image-to-Image Translation

Figure 2 for Anime-to-Real Clothing: Cosplay Costume Generation via Image-to-Image Translation

Figure 3 for Anime-to-Real Clothing: Cosplay Costume Generation via Image-to-Image Translation

Figure 4 for Anime-to-Real Clothing: Cosplay Costume Generation via Image-to-Image Translation

Cosplay has grown from its origins at fan conventions into a billion-dollar global dress phenomenon. To facilitate imagination and reinterpretation from animated images to real garments, this paper presents an automatic costume image generation method based on image-to-image translation. Cosplay items can be significantly diverse in their styles and shapes, and conventional methods cannot be directly applied to the wide variation in clothing images that are the focus of this study. To solve this problem, our method starts by collecting and preprocessing web images to prepare a cleaned, paired dataset of the anime and real domains. Then, we present a novel architecture for generative adversarial networks (GANs) to facilitate high-quality cosplay image generation. Our GAN consists of several effective techniques to fill the gap between the two domains and improve both the global and local consistency of generated images. Experiments demonstrated that, with two types of evaluation metrics, the proposed GAN achieves better performance than existing methods. We also showed that the images generated by the proposed method are more realistic than those generated by the conventional methods. Our codes and pretrained model are available on the web.

* 19 pages

Via

Access Paper or Ask Questions

Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Mar 18, 2020
Moab Arar, Yiftach Ginger, Dov Danon, Ilya Leizerson, Amit Bermano, Daniel Cohen-Or

Figure 1 for Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Figure 2 for Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Figure 3 for Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Figure 4 for Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Many applications, such as autonomous driving, heavily rely on multi-modal data where spatial alignment between the modalities is required. Most multi-modal registration methods struggle computing the spatial correspondence between the images using prevalent cross-modality similarity measures. In this work, we bypass the difficulties of developing cross-modality similarity measures, by training an image-to-image translation network on the two input modalities. This learned translation allows training the registration network using simple and reliable mono-modality metrics. We perform multi-modal registration using two networks - a spatial transformation network and a translation network. We show that by encouraging our translation network to be geometry preserving, we manage to train an accurate spatial transformation network. Compared to state-of-the-art multi-modal methods our presented method is unsupervised, requiring no pairs of aligned modalities for training, and can be adapted to any pair of modalities. We evaluate our method quantitatively and qualitatively on commercial datasets, showing that it performs well on several modalities and achieves accurate alignment.

Via

Access Paper or Ask Questions

InstaGAN: Instance-aware Image-to-Image Translation

Jan 02, 2019
Sangwoo Mo, Minsu Cho, Jinwoo Shin

Figure 1 for InstaGAN: Instance-aware Image-to-Image Translation

Figure 2 for InstaGAN: Instance-aware Image-to-Image Translation

Figure 3 for InstaGAN: Instance-aware Image-to-Image Translation

Figure 4 for InstaGAN: Instance-aware Image-to-Image Translation

Unsupervised image-to-image translation has gained considerable attention due to the recent impressive progress based on generative adversarial networks (GANs). However, previous methods often fail in challenging cases, in particular, when an image has multiple target instances and a translation task involves significant changes in shape, e.g., translating pants to skirts in fashion images. To tackle the issues, we propose a novel method, coined instance-aware GAN (InstaGAN), that incorporates the instance information (e.g., object segmentation masks) and improves multi-instance transfiguration. The proposed method translates both an image and the corresponding set of instance attributes while maintaining the permutation invariance property of the instances. To this end, we introduce a context preserving loss that encourages the network to learn the identity function outside of target instances. We also propose a sequential mini-batch inference/training technique that handles multiple instances with a limited GPU memory and enhances the network to generalize better for multiple instances. Our comparative evaluation demonstrates the effectiveness of the proposed method on different image datasets, in particular, in the aforementioned challenging cases. Code and results are available in https://github.com/sangwoomo/instagan

* Accepted to ICLR 2019. High resolution images are available in https://github.com/sangwoomo/instagan

Via

Access Paper or Ask Questions

Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation

Jul 05, 2019
Micha Pfeiffer, Isabel Funke, Maria R. Robu, Sebastian Bodenstedt, Leon Strenger, Sandy Engelhardt, Tobias Roß, Matthew J. Clarkson, Kurinchi Gurusamy, Brian R. Davidson, Lena Maier-Hein, Carina Riediger, Thilo Welsch, Jürgen Weitz, Stefanie Speidel

Figure 1 for Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation

Figure 2 for Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation

Figure 3 for Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation

Figure 4 for Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation

In the medical domain, the lack of large training data sets and benchmarks is often a limiting factor for training deep neural networks. In contrast to expensive manual labeling, computer simulations can generate large and fully labeled data sets with a minimum of manual effort. However, models that are trained on simulated data usually do not translate well to real scenarios. To bridge the domain gap between simulated and real laparoscopic images, we exploit recent advances in unpaired image-to-image translation. We extent an image-to-image translation method to generate a diverse multitude of realistically looking synthetic images based on images from a simple laparoscopy simulation. By incorporating means to ensure that the image content is preserved during the translation process, we ensure that the labels given for the simulated images remain valid for their realistically looking translations. This way, we are able to generate a large, fully labeled synthetic data set of laparoscopic images with realistic appearance. We show that this data set can be used to train models for the task of liver segmentation of laparoscopic images. We achieve average dice scores of up to 0.89 in some patients without manually labeling a single laparoscopic image and show that using our synthetic data to pre-train models can greatly improve their performance. The synthetic data set will be made publicly available, fully labeled with segmentation maps, depth maps, normal maps, and positions of tools and camera (http://opencas.dkfz.de/image2image).

* Accepted at MICCAI 2019

Via

Access Paper or Ask Questions

Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

Feb 15, 2023
Tiago Cortinhal, Eren Erdal Aksoy

Figure 1 for Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

Figure 2 for Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

Figure 3 for Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

Figure 4 for Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds

This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7$\%$ margin in terms of IoU.

Via

Access Paper or Ask Questions

Generative Modeling with Flow-Guided Density Ratio Learning

Mar 07, 2023
Alvin Heng, Abdul Fatir Ansari, Harold Soh

Figure 1 for Generative Modeling with Flow-Guided Density Ratio Learning

Figure 2 for Generative Modeling with Flow-Guided Density Ratio Learning

Figure 3 for Generative Modeling with Flow-Guided Density Ratio Learning

Figure 4 for Generative Modeling with Flow-Guided Density Ratio Learning

We present Flow-Guided Density Ratio Learning (FDRL), a simple and scalable approach to generative modeling which builds on the stale (time-independent) approximation of the gradient flow of entropy-regularized f-divergences introduced in DGflow. In DGflow, the intractable time-dependent density ratio is approximated by a stale estimator given by a GAN discriminator. This is sufficient in the case of sample refinement, where the source and target distributions of the flow are close to each other. However, this assumption is invalid for generation and a naive application of the stale estimator fails due to the large chasm between the two distributions. FDRL proposes to train a density ratio estimator such that it learns from progressively improving samples during the training process. We show that this simple method alleviates the density chasm problem, allowing FDRL to generate images of dimensions as high as $128\times128$, as well as outperform existing gradient flow baselines on quantitative benchmarks. We also show the flexibility of FDRL with two use cases. First, unconditional FDRL can be easily composed with external classifiers to perform class-conditional generation. Second, FDRL can be directly applied to unpaired image-to-image translation with no modifications needed to the framework. Code is publicly available at https://github.com/ajrheng/FDRL.

Via

Access Paper or Ask Questions

Generative Adversarial Network with Multi-Branch Discriminator for Cross-Species Image-to-Image Translation

Jan 24, 2019
Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Wu, Bing Zheng, Ping Lin

Figure 1 for Generative Adversarial Network with Multi-Branch Discriminator for Cross-Species Image-to-Image Translation

Figure 2 for Generative Adversarial Network with Multi-Branch Discriminator for Cross-Species Image-to-Image Translation

Figure 3 for Generative Adversarial Network with Multi-Branch Discriminator for Cross-Species Image-to-Image Translation

Figure 4 for Generative Adversarial Network with Multi-Branch Discriminator for Cross-Species Image-to-Image Translation

Current approaches have made great progress on image-to-image translation tasks benefiting from the success of image synthesis methods especially generative adversarial networks (GANs). However, existing methods are limited to handling translation tasks between two species while keeping the content matching on the semantic level. A more challenging task would be the translation among more than two species. To explore this new area, we propose a simple yet effective structure of a multi-branch discriminator for enhancing an arbitrary generative adversarial architecture (GAN), named GAN-MBD. It takes advantage of the boosting strategy to break a common discriminator into several smaller ones with fewer parameters, which can enhance the generation and synthesis abilities of GANs efficiently and effectively. Comprehensive experiments show that the proposed multi-branch discriminator can dramatically improve the performance of popular GANs on cross-species image-to-image translation tasks while reducing the number of parameters for computation. The code and some datasets are attached as supplementary materials for reference.

* 10 pages, 16 figures

Via

Access Paper or Ask Questions

TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

Apr 09, 2020
Jianxin Lin, Yingxue Pang, Yingce Xia, Zhibo Chen, Jiebo Luo

Figure 1 for TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

Figure 2 for TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

Figure 3 for TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

Figure 4 for TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images

An unsupervised image-to-image translation (UI2I) task deals with learning a mapping between two domains without paired images. While existing UI2I methods usually require numerous unpaired images from different domains for training, there are many scenarios where training data is quite limited. In this paper, we argue that even if each domain contains a single image, UI2I can still be achieved. To this end, we propose TuiGAN, a generative model that is trained on only two unpaired images and amounts to one-shot unsupervised learning. With TuiGAN, an image is translated in a coarse-to-fine manner where the generated image is gradually refined from global structures to local details. We conduct extensive experiments to verify that our versatile method can outperform strong baselines on a wide variety of UI2I tasks. Moreover, TuiGAN is capable of achieving comparable performance with the state-of-the-art UI2I models trained with sufficient data.

* 19 pages, 12 figures

Via

Access Paper or Ask Questions