Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image To Image Translation": models, code, and papers

Panoptic-based Object Style-Align for Image-to-Image Translation

Dec 03, 2021
Liyun Zhang, Photchara Ratsamee, Bowen Wang, Manabu Higashida, Yuki Uranishi, Haruo Takemura

Figure 1 for Panoptic-based Object Style-Align for Image-to-Image Translation

Figure 2 for Panoptic-based Object Style-Align for Image-to-Image Translation

Figure 3 for Panoptic-based Object Style-Align for Image-to-Image Translation

Figure 4 for Panoptic-based Object Style-Align for Image-to-Image Translation

Despite remarkable recent progress in image translation, the complex scene with multiple discrepant objects remains a challenging problem. Because the translated images have low fidelity and tiny objects in fewer details and obtain unsatisfactory performance in object recognition. Without the thorough object perception (i.e., bounding boxes, categories, and masks) of the image as prior knowledge, the style transformation of each object will be difficult to track in the image translation process. We propose panoptic-based object style-align generative adversarial networks (POSA-GANs) for image-to-image translation together with a compact panoptic segmentation dataset. The panoptic segmentation model is utilized to extract panoptic-level perception (i.e., overlap-removed foreground object instances and background semantic regions in the image). This is utilized to guide the alignment between the object content codes of the input domain image and object style codes sampled from the style space of the target domain. The style-aligned object representations are further transformed to obtain precise boundaries layout for higher fidelity object generation. The proposed method was systematically compared with different competing methods and obtained significant improvement on both image quality and object recognition performance for translated images.

Via

Access Paper or Ask Questions

Semantically-aware Mask CycleGAN for Translating Artistic Portraits to Photo-realistic Visualizations

Jun 11, 2023
Zhuohao Yin

Figure 1 for Semantically-aware Mask CycleGAN for Translating Artistic Portraits to Photo-realistic Visualizations

Figure 2 for Semantically-aware Mask CycleGAN for Translating Artistic Portraits to Photo-realistic Visualizations

Figure 3 for Semantically-aware Mask CycleGAN for Translating Artistic Portraits to Photo-realistic Visualizations

Figure 4 for Semantically-aware Mask CycleGAN for Translating Artistic Portraits to Photo-realistic Visualizations

Image-to-image translation (I2I) is defined as a computer vision task where the aim is to transfer images in a source domain to a target domain with minimal loss or alteration of the content representations. Major progress has been made since I2I was proposed with the invention of a variety of revolutionary generative models. Among them, GAN-based models perform exceptionally well as they are mostly tailor-made for specific domains or tasks. However, few works proposed a tailor-made method for the artistic domain. In this project, I propose the Semantic-aware Mask CycleGAN (SMCycleGAN) architecture which can translate artistic portraits to photo-realistic visualizations. This model can generate realistic human portraits by feeding the discriminators semantically masked fake samples, thus enforcing them to make discriminative decisions with partial information so that the generators can be optimized to synthesize more realistic human portraits instead of increasing the similarity of other irrelevant components, such as the background. Experiments have shown that the SMCycleGAN generate images with significantly increased realism and minimal loss of content representations.

Via

Access Paper or Ask Questions

Asymmetric Generative Adversarial Networks for Image-to-Image Translation

Dec 14, 2019
Hao Tang, Dan Xu, Hong Liu, Nicu Sebe

Figure 1 for Asymmetric Generative Adversarial Networks for Image-to-Image Translation

Figure 2 for Asymmetric Generative Adversarial Networks for Image-to-Image Translation

Figure 3 for Asymmetric Generative Adversarial Networks for Image-to-Image Translation

Figure 4 for Asymmetric Generative Adversarial Networks for Image-to-Image Translation

State-of-the-art models for unpaired image-to-image translation with Generative Adversarial Networks (GANs) can learn the mapping from the source domain to the target domain using a cycle-consistency loss. The intuition behind these models is that if we translate from one domain to the other and back again we should arrive at where we started. However, existing methods always adopt a symmetric network architecture to learn both forward and backward cycles. Because of the task complexity and cycle input difference between the source and target image domains, the inequality in bidirectional forward-backward cycle translations is significant and the amount of information between two domains is different. In this paper, we analyze the limitation of the existing symmetric GAN models in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image-to-image translation tasks. Moreover, the training stage of existing methods has the common problem of model collapse that degrades the quality of the generated images, thus we explore different optimization losses for better training of AsymmetricGAN, and thus make image-to-image translation with higher consistency and better stability. Extensive experiments on both supervised and unsupervised generative tasks with several publicly available datasets demonstrate that the proposed AsymmetricGAN achieves superior model capacity and better generation performance compared with existing GAN models. To the best of our knowledge, we are the first to investigate the asymmetric GAN framework on both unsupervised and supervised image-to-image translation tasks. The source code, data and trained models are available at https://github.com/Ha0Tang/AsymmetricGAN.

* An extended version of a paper published in ACCV2018. arXiv admin note: substantial text overlap with arXiv:1901.04604

Via

Access Paper or Ask Questions

A Novel Framework for Image-to-image Translation and Image Compression

Nov 25, 2021
Fei Yang, Yaxing Wang, Luis Herranz, Yongmei Cheng, Mikhail Mozerov

Figure 1 for A Novel Framework for Image-to-image Translation and Image Compression

Figure 2 for A Novel Framework for Image-to-image Translation and Image Compression

Figure 3 for A Novel Framework for Image-to-image Translation and Image Compression

Figure 4 for A Novel Framework for Image-to-image Translation and Image Compression

Data-driven paradigms using machine learning are becoming ubiquitous in image processing and communications. In particular, image-to-image (I2I) translation is a generic and widely used approach to image processing problems, such as image synthesis, style transfer, and image restoration. At the same time, neural image compression has emerged as a data-driven alternative to traditional coding approaches in visual communications. In this paper, we study the combination of these two paradigms into a joint I2I compression and translation framework, focusing on multi-domain image synthesis. We first propose distributed I2I translation by integrating quantization and entropy coding into an I2I translation framework (i.e. I2Icodec). In practice, the image compression functionality (i.e. autoencoding) is also desirable, requiring to deploy alongside I2Icodec a regular image codec. Thus, we further propose a unified framework that allows both translation and autoencoding capabilities in a single codec. Adaptive residual blocks conditioned on the translation/compression mode provide flexible adaptation to the desired functionality. The experiments show promising results in both I2I translation and image compression using a single model.

Via

Access Paper or Ask Questions

Unpaired Image-to-Image Translation using Adversarial Consistency Loss

Mar 10, 2020
Yihao Zhao, Ruihai Wu, Hao Dong

Figure 1 for Unpaired Image-to-Image Translation using Adversarial Consistency Loss

Figure 2 for Unpaired Image-to-Image Translation using Adversarial Consistency Loss

Figure 3 for Unpaired Image-to-Image Translation using Adversarial Consistency Loss

Figure 4 for Unpaired Image-to-Image Translation using Adversarial Consistency Loss

Unpaired image-to-image translation is a class of vision problems whose goal is to find the mapping between different image domains using unpaired training data. Cycle-consistency loss is a widely used constraint for such problems. However, due to the strict pixel-level constraint, it cannot perform geometric changes, remove large objects, or ignore irrelevant texture. In this paper, we propose a novel adversarial-consistency loss for image-to-image translation. This loss does not require the translated image to be translated back to be a specific source image but can encourage the translated images to retain important features of the source images and overcome the drawbacks of cycle-consistency loss noted above. Our method achieves state-of-the-art results on three challenging tasks: glasses removal, male-to-female translation, and selfie-to-anime translation.

Via

Access Paper or Ask Questions

Unsupervised Image-to-Image Translation Networks

Jul 23, 2018
Ming-Yu Liu, Thomas Breuel, Jan Kautz

Figure 1 for Unsupervised Image-to-Image Translation Networks

Figure 2 for Unsupervised Image-to-Image Translation Networks

Figure 3 for Unsupervised Image-to-Image Translation Networks

Figure 4 for Unsupervised Image-to-Image Translation Networks

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains. Since there exists an infinite set of joint distributions that can arrive the given marginal distributions, one could infer nothing about the joint distribution from the marginal distributions without additional assumptions. To address the problem, we make a shared-latent space assumption and propose an unsupervised image-to-image translation framework based on Coupled GANs. We compare the proposed framework with competing approaches and present high quality image translation results on various challenging unsupervised image translation tasks, including street scene image translation, animal image translation, and face image translation. We also apply the proposed framework to domain adaptation and achieve state-of-the-art performance on benchmark datasets. Code and additional results are available in https://github.com/mingyuliutw/unit .

* NIPS 2017, 11 pages, 6 figures

Via

Access Paper or Ask Questions

Global and Local Alignment Networks for Unpaired Image-to-Image Translation

Nov 19, 2021
Guanglei Yang, Hao Tang, Humphrey Shi, Mingli Ding, Nicu Sebe, Radu Timofte, Luc Van Gool, Elisa Ricci

Figure 1 for Global and Local Alignment Networks for Unpaired Image-to-Image Translation

Figure 2 for Global and Local Alignment Networks for Unpaired Image-to-Image Translation

Figure 3 for Global and Local Alignment Networks for Unpaired Image-to-Image Translation

Figure 4 for Global and Local Alignment Networks for Unpaired Image-to-Image Translation

The goal of unpaired image-to-image translation is to produce an output image reflecting the target domain's style while keeping unrelated contents of the input source image unchanged. However, due to the lack of attention to the content change in existing methods, the semantic information from source images suffers from degradation during translation. In the paper, to address this issue, we introduce a novel approach, Global and Local Alignment Networks (GLA-Net). The global alignment network aims to transfer the input image from the source domain to the target domain. To effectively do so, we learn the parameters (mean and standard deviation) of multivariate Gaussian distributions as style features by using an MLP-Mixer based style encoder. To transfer the style more accurately, we employ an adaptive instance normalization layer in the encoder, with the parameters of the target multivariate Gaussian distribution as input. We also adopt regularization and likelihood losses to further reduce the domain gap and produce high-quality outputs. Additionally, we introduce a local alignment network, which employs a pretrained self-supervised model to produce an attention map via a novel local alignment loss, ensuring that the translation network focuses on relevant pixels. Extensive experiments conducted on five public datasets demonstrate that our method effectively generates sharper and more realistic images than existing approaches. Our code is available at https://github.com/ygjwd12345/GLANet.

Via

Access Paper or Ask Questions

Memory-guided Unsupervised Image-to-image Translation

Apr 12, 2021
Somi Jeong, Youngjung Kim, Eungbean Lee, Kwanghoon Sohn

Figure 1 for Memory-guided Unsupervised Image-to-image Translation

Figure 2 for Memory-guided Unsupervised Image-to-image Translation

Figure 3 for Memory-guided Unsupervised Image-to-image Translation

Figure 4 for Memory-guided Unsupervised Image-to-image Translation

We present a novel unsupervised framework for instance-level image-to-image translation. Although recent advances have been made by incorporating additional object annotations, existing methods often fail to handle images with multiple disparate objects. The main cause is that, during inference, they apply a global style to the whole image and do not consider the large style discrepancy between instance and background, or within instances. To address this problem, we propose a class-aware memory network that explicitly reasons about local style variations. A key-values memory structure, with a set of read/update operations, is introduced to record class-wise style variations and access them without requiring an object detector at the test time. The key stores a domain-agnostic content representation for allocating memory items, while the values encode domain-specific style representations. We also present a feature contrastive loss to boost the discriminative power of memory items. We show that by incorporating our memory, we can transfer class-aware and accurate style representations across domains. Experimental results demonstrate that our model outperforms recent instance-level methods and achieves state-of-the-art performance.

* CVPR 2021

Via

Access Paper or Ask Questions

Few-Shot Unsupervised Image-to-Image Translation

May 05, 2019
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz

Figure 1 for Few-Shot Unsupervised Image-to-Image Translation

Figure 2 for Few-Shot Unsupervised Image-to-Image Translation

Figure 3 for Few-Shot Unsupervised Image-to-Image Translation

Figure 4 for Few-Shot Unsupervised Image-to-Image Translation

Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework. Code will be available at https://nvlabs.github.io/FUNIT .

Via

Access Paper or Ask Questions

The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Jul 24, 2020
Eitan Richardson, Yair Weiss

Figure 1 for The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Figure 2 for The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Figure 3 for The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Figure 4 for The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation

Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of nonlocal problems for which the state-of-the-art fails while linear methods succeed.

* Preprint - under review

Via

Access Paper or Ask Questions