Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image To Image Translation": models, code, and papers

Few-Shot Unsupervised Image-to-Image Translation on complex scenes

Jun 07, 2021
Luca Barras, Samuel Chassot, Daniel Filipe Nunes Silva

Figure 1 for Few-Shot Unsupervised Image-to-Image Translation on complex scenes

Figure 2 for Few-Shot Unsupervised Image-to-Image Translation on complex scenes

Figure 3 for Few-Shot Unsupervised Image-to-Image Translation on complex scenes

Figure 4 for Few-Shot Unsupervised Image-to-Image Translation on complex scenes

Unsupervised image-to-image translation methods have received a lot of attention in the last few years. Multiple techniques emerged tackling the initial challenge from different perspectives. Some focus on learning as much as possible from several target style images for translations while other make use of object detection in order to produce more realistic results on content-rich scenes. In this work, we assess how a method that has initially been developed for single object translation performs on more diverse and content-rich images. Our work is based on the FUNIT[1] framework and we train it with a more diverse dataset. This helps understanding how such method behaves beyond their initial frame of application. We present a way to extend a dataset based on object detection. Moreover, we propose a way to adapt the FUNIT framework in order to leverage the power of object detection that one can see in other methods.

Via

Access Paper or Ask Questions

Semantics-Aware Image to Image Translation and Domain Transfer

Apr 03, 2019
Pravakar Roy, Nicolai Häni, Volkan Isler

Figure 1 for Semantics-Aware Image to Image Translation and Domain Transfer

Figure 2 for Semantics-Aware Image to Image Translation and Domain Transfer

Figure 3 for Semantics-Aware Image to Image Translation and Domain Transfer

Figure 4 for Semantics-Aware Image to Image Translation and Domain Transfer

Image to image translation is the problem of transferring an image from a source domain to a target domain. We present a new method to transfer the underlying semantics of an image even when there are geometric changes across the two domains. Specifically, we present a Generative Adversarial Network (GAN) that can transfer semantic information presented as segmentation masks. Our main technical contribution is an encoder-decoder based generator architecture that jointly encodes the image and its underlying semantics and translates both simultaneously to the target domain. Additionally, we propose object transfiguration and cross-domain semantic consistency losses that preserve the underlying semantic labels maps. We demonstrate the effectiveness of our approach in multiple object transfiguration and domain transfer tasks through qualitative and quantitative experiments. The results show that our method is better at transferring image semantics than state of the art image to image translation methods.

Via

Access Paper or Ask Questions

Controlling biases and diversity in diverse image-to-image translation

Jul 23, 2019
Yaxing Wang, Abel Gonzalez-Garcia, Joost van de Weijer, Luis Herranz

Figure 1 for Controlling biases and diversity in diverse image-to-image translation

Figure 2 for Controlling biases and diversity in diverse image-to-image translation

Figure 3 for Controlling biases and diversity in diverse image-to-image translation

Figure 4 for Controlling biases and diversity in diverse image-to-image translation

The task of unpaired image-to-image translation is highly challenging due to the lack of explicit cross-domain pairs of instances. We consider here diverse image translation (DIT), an even more challenging setting in which an image can have multiple plausible translations. This is normally achieved by explicitly disentangling content and style in the latent representation and sampling different styles codes while maintaining the image content. Despite the success of current DIT models, they are prone to suffer from bias. In this paper, we study the problem of bias in image-to-image translation. Biased datasets may add undesired changes (e.g. change gender or race in face images) to the output translations as a consequence of the particular underlying visual distribution in the target domain. In order to alleviate the effects of this problem we propose the use of semantic constraints that enforce the preservation of desired image properties. Our proposed model is a step towards unbiased diverse image-to-image translation (UDIT), and results in less unwanted changes in the translated images while still performing the wanted transformation. Experiments on several heavily biased datasets show the effectiveness of the proposed techniques in different domains such as faces, objects, and scenes.

* The paper is under consideration at Computer Vision and Image Understanding

Via

Access Paper or Ask Questions

A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image

Mar 04, 2021
Sifan Song, Daiyun Huang, Yalun Hu, Chunxiao Yang, Jia Meng, Fei Ma, Jiaming Zhang, Jionglong Su

Figure 1 for A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image

Figure 2 for A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image

Figure 3 for A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image

Figure 4 for A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image

In medical imaging, chromosome straightening plays a significant role in the pathological study of chromosomes and in the development of cytogenetic maps. Whereas different approaches exist for the straightening task, they are mostly geometric algorithms whose outputs are characterized by jagged edges or fragments with discontinued banding patterns. To address the flaws in the geometric algorithms, we propose a novel framework based on image-to-image translation to learn a pertinent mapping dependence for synthesizing straightened chromosomes with uninterrupted banding patterns and preserved details. In addition, to avoid the pitfall of deficient input chromosomes, we construct an augmented dataset using only one single curved chromosome image for training models. Based on this framework, we apply two popular image-to-image translation architectures, U-shape networks and conditional generative adversarial networks, to assess its efficacy. Experiments on a dataset comprising of 642 real-world chromosomes demonstrate the superiority of our framework as compared to the geometric method in straightening performance by rendering realistic and continued chromosome details. Furthermore, our straightened results improve the chromosome classification, achieving 0.98%-1.39% in mean accuracy.

Via

Access Paper or Ask Questions

Towards Instance-level Image-to-Image Translation

May 05, 2019
Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, Thomas Huang

Figure 1 for Towards Instance-level Image-to-Image Translation

Figure 2 for Towards Instance-level Image-to-Image Translation

Figure 3 for Towards Instance-level Image-to-Image Translation

Figure 4 for Towards Instance-level Image-to-Image Translation

Unpaired Image-to-image Translation is a new rising and challenging vision problem that aims to learn a mapping between unaligned image pairs in diverse domains. Recent advances in this field like MUNIT and DRIT mainly focus on disentangling content and style/attribute from a given image first, then directly adopting the global style to guide the model to synthesize new domain images. However, this kind of approaches severely incurs contradiction if the target domain images are content-rich with multiple discrepant objects. In this paper, we present a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially. The proposed INIT exhibits three import advantages: (1) the instance-level objective loss can help learn a more accurate reconstruction and incorporate diverse attributes of objects; (2) the styles used for target domain of local/global areas are from corresponding spatial regions in source domain, which intuitively is a more reasonable mapping; (3) the joint training process can benefit both fine and coarse granularity and incorporates instance information to improve the quality of global translation. We also collect a large-scale benchmark for the new instance-level translation task. We observe that our synthetic images can even benefit real-world vision tasks like generic object detection.

* Accepted to CVPR 2019. Project page: http://zhiqiangshen.com/projects/INIT/index.html

Via

Access Paper or Ask Questions

SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation

Mar 30, 2021
Xuning Shao, Weidong Zhang

Figure 1 for SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation

Figure 2 for SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation

Figure 3 for SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation

Figure 4 for SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation

For unsupervised image-to-image translation, we propose a discriminator architecture which focuses on the statistical features instead of individual patches. The network is stabilized by distribution matching of key statistical features at multiple scales. Unlike the existing methods which impose more and more constraints on the generator, our method facilitates the shape deformation and enhances the fine details with a greatly simplified framework. We show that the proposed method outperforms the existing state-of-the-art models in various challenging applications including selfie-to-anime, male-to-female and glasses removal. The code will be made publicly available.

Via

Access Paper or Ask Questions

Single Image LDR to HDR Conversion using Conditional Diffusion

Jul 06, 2023
Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman

Figure 1 for Single Image LDR to HDR Conversion using Conditional Diffusion

Figure 2 for Single Image LDR to HDR Conversion using Conditional Diffusion

Figure 3 for Single Image LDR to HDR Conversion using Conditional Diffusion

Figure 4 for Single Image LDR to HDR Conversion using Conditional Diffusion

Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures.

* IEEE International Conference on Image Processing 2023

Via

Access Paper or Ask Questions

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Aug 03, 2020
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or

Figure 1 for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Figure 2 for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Figure 3 for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Figure 4 for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

We present a generic image-to-image translation framework, Pixel2Style2Pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. We further introduce a dedicated identity loss which is shown to achieve improved performance in the reconstruction of an input image. We demonstrate pSp to be a simple architecture that, by leveraging a well-trained, fixed generator network, can be easily applied on a wide-range of image-to-image translation tasks. Solving these tasks through the style representation results in a global approach that does not rely on a local pixel-to-pixel correspondence and further supports multi-modal synthesis via the resampling of styles. Notably, we demonstrate that pSp can be trained to align a face image to a frontal pose without any labeled data, generate multi-modal results for ambiguous tasks such as conditional face generation from segmentation maps, and construct high-resolution images from corresponding low-resolution images.

Via

Access Paper or Ask Questions

Unsupervised Image to Image Translation for Multiple Retinal Pathology Synthesis in Optical Coherence Tomography Scans

Dec 11, 2021
Hemanth Pasupuleti, G. N. Girish

Figure 1 for Unsupervised Image to Image Translation for Multiple Retinal Pathology Synthesis in Optical Coherence Tomography Scans

Figure 2 for Unsupervised Image to Image Translation for Multiple Retinal Pathology Synthesis in Optical Coherence Tomography Scans

Figure 3 for Unsupervised Image to Image Translation for Multiple Retinal Pathology Synthesis in Optical Coherence Tomography Scans

Figure 4 for Unsupervised Image to Image Translation for Multiple Retinal Pathology Synthesis in Optical Coherence Tomography Scans

Image to Image Translation (I2I) is a challenging computer vision problem used in numerous domains for multiple tasks. Recently, ophthalmology became one of the major fields where the application of I2I is increasing rapidly. One such application is the generation of synthetic retinal optical coherence tomographic (OCT) scans. Existing I2I methods require training of multiple models to translate images from normal scans to a specific pathology: limiting the use of these models due to their complexity. To address this issue, we propose an unsupervised multi-domain I2I network with pre-trained style encoder that translates retinal OCT images in one domain to multiple domains. We assume that the image splits into domain-invariant content and domain-specific style codes, and pre-train these style codes. The performed experiments show that the proposed model outperforms state-of-the-art models like MUNIT and CycleGAN synthesizing diverse pathological scans.

Via

Access Paper or Ask Questions

RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes

Aug 20, 2019
Po-Wei Wu, Yu-Jing Lin, Che-Han Chang, Edward Y. Chang, Shih-Wei Liao

Figure 1 for RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes

Figure 2 for RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes

Figure 3 for RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes

Figure 4 for RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes

Multi-domain image-to-image translation has gained increasing attention recently. Previous methods take an image and some target attributes as inputs and generate an output image with the desired attributes. However, such methods have two limitations. First, these methods assume binary-valued attributes and thus cannot yield satisfactory results for fine-grained control. Second, these methods require specifying the entire set of target attributes, even if most of the attributes would not be changed. To address these limitations, we propose RelGAN, a new method for multi-domain image-to-image translation. The key idea is to use relative attributes, which describes the desired change on selected attributes. Our method is capable of modifying images by changing particular attributes of interest in a continuous manner while preserving the other attributes. Experimental results demonstrate both the quantitative and qualitative effectiveness of our method on the tasks of facial attribute transfer and interpolation.

* Accepted to ICCV 2019

Via

Access Paper or Ask Questions