Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Gated SwitchGAN for multi-domain facial image translation

Nov 28, 2021
Xiaokang Zhang, Yuanlue Zhu, Wenting Chen, Wenshuang Liu, Linlin Shen

Recent studies on multi-domain facial image translation have achieved impressive results. The existing methods generally provide a discriminator with an auxiliary classifier to impose domain translation. However, these methods neglect important information regarding domain distribution matching. To solve this problem, we propose a switch generative adversarial network (SwitchGAN) with a more adaptive discriminator structure and a matched generator to perform delicate image translation among multiple domains. A feature-switching operation is proposed to achieve feature selection and fusion in our conditional modules. We demonstrate the effectiveness of our model. Furthermore, we also introduce a new capability of our generator that represents attribute intensity control and extracts content information without tailored training. Experiments on the Morph, RaFD and CelebA databases visually and quantitatively show that our extended SwitchGAN (i.e., Gated SwitchGAN) can achieve better translation results than StarGAN, AttGAN and STGAN. The attribute classification accuracy achieved using the trained ResNet-18 model and the FID score obtained using the ImageNet pretrained Inception-v3 model also quantitatively demonstrate the superior performance of our models.


Disentangled Unsupervised Image Translation via Restricted Information Flow

Nov 26, 2021
Ben Usman, Dina Bashkirova, Kate Saenko

Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains. In the many-to-many setting, an additional guidance example from the target domain is used to determine domain-specific attributes of the generated image. In the absence of attribute annotations, methods have to infer which factors are specific to each domain from data during training. Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture, severely restricting the scope of the problem. In this paper, we propose a new method that does not rely on such inductive architectural biases, and infers which attributes are domain-specific from data by constraining information flow through the network using translation honesty losses and a penalty on the capacity of domain-specific embedding. We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset spanning a wide variety of domain-specific and shared attributes.


DCT-Net: Domain-Calibrated Translation for Portrait Stylization

Jul 06, 2022
Yifang Men, Yuan Yao, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars ($\sim$100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfitted in the target domain, due to the biased distribution formed by only a few training examples. This paper aims to handle the challenge by adopting the key idea of "calibration first, translation later" and exploring the augmented global structure with locally-focused translation. Specifically, the proposed DCT-Net consists of three modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. Experimental results demonstrate the proposed method's superiority over the state of the art in head stylization and its effectiveness on full image translation with adaptive deformations.

* Accepted by SIGGRAPH 2022 (TOG). Project Page: , Code: 

Generating Multi-scale Maps from Remote Sensing Images via Series Generative Adversarial Networks

Mar 31, 2021
Xu Chen, Bangguo Yin, Songqiang Chen, Haifeng Li, Tian Xu

Considering the success of generative adversarial networks (GANs) for image-to-image translation, researchers have attempted to translate remote sensing images (RSIs) to maps (rs2map) through GAN for cartography. However, these studies involved limited scales, which hinders multi-scale map creation. By extending their method, multi-scale RSIs can be trivially translated to multi-scale maps (multi-scale rs2map translation) through scale-wise rs2map models trained for certain scales (parallel strategy). However, this strategy has two theoretical limitations. First, inconsistency between various spatial resolutions of multi-scale RSIs and object generalization on multi-scale maps (RS-m inconsistency) increasingly complicate the extraction of geographical information from RSIs for rs2map models with decreasing scale. Second, as rs2map translation is cross-domain, generators incur high computation costs to transform the RSI pixel distribution to that on maps. Thus, we designed a series strategy of generators for multi-scale rs2map translation to address these limitations. In this strategy, high-resolution RSIs are inputted to an rs2map model to output large-scale maps, which are translated to multi-scale maps through series multi-scale map translation models. The series strategy avoids RS-m inconsistency as inputs are high-resolution large-scale RSIs, and reduces the distribution gap in multi-scale map generation through similar pixel distributions among multi-scale maps. Our experimental results showed better quality multi-scale map generation with the series strategy, as shown by average increases of 11.69%, 53.78%, 55.42%, and 72.34% in the structural similarity index, edge structural similarity index, intersection over union (road), and intersection over union (water) for data from Mexico City and Tokyo at zoom level 17-13.


PI-Trans: Parallel-ConvMLP and Implicit-Transformation Based GAN for Cross-View Image Translation

Jul 09, 2022
Bin Ren, Hao Tang, Yiming Wang, Xia Li, Wei Wang, Nicu Sebe

For semantic-guided cross-view image translation, it is crucial to learn where to sample pixels from the source view image and where to reallocate them guided by the target view semantic map, especially when there is little overlap or drastic view difference between the source and target images. Hence, one not only needs to encode the long-range dependencies among pixels in both the source view image and target view the semantic map but also needs to translate these learned dependencies. To this end, we propose a novel generative adversarial network, PI-Trans, which mainly consists of a novel Parallel-ConvMLP module and an Implicit Transformation module at multiple semantic levels. Extensive experimental results show that the proposed PI-Trans achieves the best qualitative and quantitative performance by a large margin compared to the state-of-the-art methods on two challenging datasets. The code will be made available at

* 13 pages, 8 figures 

Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation

Mar 30, 2021
Gihyun Kwon, Jong Chul Ye

One of the important research topics in image generative models is to disentangle the spatial contents and styles for their separate control. Although StyleGAN can generate content feature vectors from random noises, the resulting spatial content control is primarily intended for minor spatial variations, and the disentanglement of global content and styles is by no means complete. Inspired by a mathematical understanding of normalization and attention, here we present a novel hierarchical adaptive Diagonal spatial ATtention (DAT) layers to separately manipulate the spatial contents from styles in a hierarchical manner. Using DAT and AdaIN, our method enables coarse-to-fine level disentanglement of spatial contents and styles. In addition, our generator can be easily integrated into the GAN inversion framework so that the content and style of translated images from multi-domain image translation tasks can be flexibly controlled. By using various datasets, we confirm that the proposed method not only outperforms the existing models in disentanglement scores, but also provides more flexible control over spatial features in the generated images.


Bi-level Feature Alignment for Versatile Image Translation and Manipulation

Jul 07, 2021
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Kaiwen Cui, Aoran Xiao, Shijian Lu, Ling Shao

Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. To handle the quadratic complexity incurred by building the dense correspondences, we introduce a bi-level feature alignment strategy that adopts a top-$k$ operation to rank block-wise features followed by dense attention between block features which reduces memory cost substantially. As the top-$k$ operation involves index swapping which precludes the gradient propagation, we propose to approximate the non-differentiable top-$k$ operation with a regularized earth mover's problem so that its gradient can be effectively back-propagated. In addition, we design a novel semantic position encoding mechanism that builds up coordinate for each individual semantic region to preserve texture structures while building correspondences. Further, we design a novel confidence feature injection module which mitigates mismatch problem by fusing features adaptively according to the reliability of built correspondences. Extensive experiments show that our method achieves superior performance qualitatively and quantitatively as compared with the state-of-the-art. The code is available at \href{}{}.

* Submitted to TPAMI 

Optimal translational-rotational invariant dictionaries for images

Sep 04, 2019
Davide Barbieri, Carlos Cabrelli, Eugenio Hernández, Ursula Molter

We provide the construction of a set of square matrices whose translates and rotates provide a Parseval frame that is optimal for approximating a given dataset of images. Our approach is based on abstract harmonic analysis techniques. Optimality is considered with respect to the quadratic error of approximation of the images in the dataset with their projection onto a linear subspace that is invariant under translations and rotations. In addition, we provide an elementary and fully self-contained proof of optimality, and the numerical results from datasets of natural images.


A Domain Gap Aware Generative Adversarial Network for Multi-domain Image Translation

Oct 21, 2021
Wenju Xu, Guanghui Wang

Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning the inverse mapping introduces extra trainable parameters and it is unable to learn the inverse mapping for some domains. As a result, they are ineffective in the scenarios where (i) multiple visual image domains are involved; (ii) both structure and texture transformations are required; and (iii) semantic consistency is preserved. To solve these challenges, the paper proposes a unified model to translate images across multiple domains with significant domain gaps. Unlike previous models that constrain the generators with the ubiquitous cycle-consistency constraint to achieve the content similarity, the proposed model employs a perceptual self-regularization constraint. With a single unified generator, the model can maintain consistency over the global shapes as well as the local texture information across multiple domains. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and superior performance over state-of-the-art models. It is more effective in representing shape deformation in challenging mappings with significant dataset variation across multiple domains.


Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database

Dec 30, 2019
Tong Zheng, Hirohisa Oda, Takayasu Moriya, Shota Nakamura, Masahiro Oda, Masaki Mori, Horitsugu Takabatake, Hiroshi Natori, Kensaku Mori

This paper newly introduces multi-modality loss function for GAN-based super-resolution that can maintain image structure and intensity on unpaired training dataset of clinical CT and micro CT volumes. Precise non-invasive diagnosis of lung cancer mainly utilizes 3D multidetector computed-tomography (CT) data. On the other hand, we can take micro CT images of resected lung specimen in 50 micro meter or higher resolution. However, micro CT scanning cannot be applied to living human imaging. For obtaining highly detailed information such as cancer invasion area from pre-operative clinical CT volumes of lung cancer patients, super-resolution (SR) of clinical CT volumes to $\mu$CT level might be one of substitutive solutions. While most SR methods require paired low- and high-resolution images for training, it is infeasible to obtain precisely paired clinical CT and micro CT volumes. We aim to propose unpaired SR approaches for clincial CT using micro CT images based on unpaired image translation methods such as CycleGAN or UNIT. Since clinical CT and micro CT are very different in structure and intensity, direct application of GAN-based unpaired image translation methods in super-resolution tends to generate arbitrary images. Aiming to solve this problem, we propose new loss function called multi-modality loss function to maintain the similarity of input images and corresponding output images in super-resolution task. Experimental results demonstrated that the newly proposed loss function made CycleGAN and UNIT to successfully perform SR of clinical CT images of lung cancer patients into micro CT level resolution, while original CycleGAN and UNIT failed in super-resolution.

* 6 pages, 2 figures