Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Long-range medical image registration through generalized mutual information (GMI): toward a fully automatic volumetric alignment

Nov 30, 2020
Vinicius Pavanelli Vianna, Luiz Otavio Murta Jr

Image registration is a key operation in medical image processing, allowing a plethora of applications. Mutual information (MI) is consolidated as a robust similarity metric often used for medical image registration. Although MI provides a robust medical image registration, it usually fails when the needed image transform is too big due to MI local maxima traps. In this paper, we propose and evaluate a generalized parametric MI as an affine registration cost function. We assessed the generalized MI (GMI) functions for separable affine transforms and exhaustively evaluated the GMI mathematical image seeking the maximum registration range through a gradient descent simulation. We also employed Monte Carlo simulation essays for testing translation registering of randomized T1 versus T2 images. GMI functions showed to have smooth isosurfaces driving the algorithm to the global maxima. Results show significantly prolonged registration ranges, avoiding the traps of local maxima. We evaluated a range of [-150mm,150mm] for translations, [-180{\deg},180{\deg}] for rotations, [0.5,2] for scales, and [-1,1] for skew with a success rate of 99.99%, 97.58%, 99.99%, and 99.99% respectively for the transforms in the simulated gradient descent. We also obtained 99.75% success in Monte Carlo simulation from 2,000 randomized translations trials with 1,113 subjects T1 and T2 MRI images. The findings point towards the reliability of GMI for long-range registration with enhanced speed performance

* 13 pages, 8 figures, 3 tables 
Access Paper or Ask Questions

Gated SwitchGAN for multi-domain facial image translation

Nov 28, 2021
Xiaokang Zhang, Yuanlue Zhu, Wenting Chen, Wenshuang Liu, Linlin Shen

Recent studies on multi-domain facial image translation have achieved impressive results. The existing methods generally provide a discriminator with an auxiliary classifier to impose domain translation. However, these methods neglect important information regarding domain distribution matching. To solve this problem, we propose a switch generative adversarial network (SwitchGAN) with a more adaptive discriminator structure and a matched generator to perform delicate image translation among multiple domains. A feature-switching operation is proposed to achieve feature selection and fusion in our conditional modules. We demonstrate the effectiveness of our model. Furthermore, we also introduce a new capability of our generator that represents attribute intensity control and extracts content information without tailored training. Experiments on the Morph, RaFD and CelebA databases visually and quantitatively show that our extended SwitchGAN (i.e., Gated SwitchGAN) can achieve better translation results than StarGAN, AttGAN and STGAN. The attribute classification accuracy achieved using the trained ResNet-18 model and the FID score obtained using the ImageNet pretrained Inception-v3 model also quantitatively demonstrate the superior performance of our models.

Access Paper or Ask Questions

Disentangled Unsupervised Image Translation via Restricted Information Flow

Nov 26, 2021
Ben Usman, Dina Bashkirova, Kate Saenko

Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains. In the many-to-many setting, an additional guidance example from the target domain is used to determine domain-specific attributes of the generated image. In the absence of attribute annotations, methods have to infer which factors are specific to each domain from data during training. Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture, severely restricting the scope of the problem. In this paper, we propose a new method that does not rely on such inductive architectural biases, and infers which attributes are domain-specific from data by constraining information flow through the network using translation honesty losses and a penalty on the capacity of domain-specific embedding. We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset spanning a wide variety of domain-specific and shared attributes.

Access Paper or Ask Questions

Generating Multi-scale Maps from Remote Sensing Images via Series Generative Adversarial Networks

Mar 31, 2021
Xu Chen, Bangguo Yin, Songqiang Chen, Haifeng Li, Tian Xu

Considering the success of generative adversarial networks (GANs) for image-to-image translation, researchers have attempted to translate remote sensing images (RSIs) to maps (rs2map) through GAN for cartography. However, these studies involved limited scales, which hinders multi-scale map creation. By extending their method, multi-scale RSIs can be trivially translated to multi-scale maps (multi-scale rs2map translation) through scale-wise rs2map models trained for certain scales (parallel strategy). However, this strategy has two theoretical limitations. First, inconsistency between various spatial resolutions of multi-scale RSIs and object generalization on multi-scale maps (RS-m inconsistency) increasingly complicate the extraction of geographical information from RSIs for rs2map models with decreasing scale. Second, as rs2map translation is cross-domain, generators incur high computation costs to transform the RSI pixel distribution to that on maps. Thus, we designed a series strategy of generators for multi-scale rs2map translation to address these limitations. In this strategy, high-resolution RSIs are inputted to an rs2map model to output large-scale maps, which are translated to multi-scale maps through series multi-scale map translation models. The series strategy avoids RS-m inconsistency as inputs are high-resolution large-scale RSIs, and reduces the distribution gap in multi-scale map generation through similar pixel distributions among multi-scale maps. Our experimental results showed better quality multi-scale map generation with the series strategy, as shown by average increases of 11.69%, 53.78%, 55.42%, and 72.34% in the structural similarity index, edge structural similarity index, intersection over union (road), and intersection over union (water) for data from Mexico City and Tokyo at zoom level 17-13.

Access Paper or Ask Questions

Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation

Mar 30, 2021
Gihyun Kwon, Jong Chul Ye

One of the important research topics in image generative models is to disentangle the spatial contents and styles for their separate control. Although StyleGAN can generate content feature vectors from random noises, the resulting spatial content control is primarily intended for minor spatial variations, and the disentanglement of global content and styles is by no means complete. Inspired by a mathematical understanding of normalization and attention, here we present a novel hierarchical adaptive Diagonal spatial ATtention (DAT) layers to separately manipulate the spatial contents from styles in a hierarchical manner. Using DAT and AdaIN, our method enables coarse-to-fine level disentanglement of spatial contents and styles. In addition, our generator can be easily integrated into the GAN inversion framework so that the content and style of translated images from multi-domain image translation tasks can be flexibly controlled. By using various datasets, we confirm that the proposed method not only outperforms the existing models in disentanglement scores, but also provides more flexible control over spatial features in the generated images.

Access Paper or Ask Questions

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

Jul 07, 2021
Fangneng Zhan, Yingchen Yu, Rongliang Wu, Kaiwen Cui, Aoran Xiao, Shijian Lu, Ling Shao

Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. To handle the quadratic complexity incurred by building the dense correspondences, we introduce a bi-level feature alignment strategy that adopts a top-$k$ operation to rank block-wise features followed by dense attention between block features which reduces memory cost substantially. As the top-$k$ operation involves index swapping which precludes the gradient propagation, we propose to approximate the non-differentiable top-$k$ operation with a regularized earth mover's problem so that its gradient can be effectively back-propagated. In addition, we design a novel semantic position encoding mechanism that builds up coordinate for each individual semantic region to preserve texture structures while building correspondences. Further, we design a novel confidence feature injection module which mitigates mismatch problem by fusing features adaptively according to the reliability of built correspondences. Extensive experiments show that our method achieves superior performance qualitatively and quantitatively as compared with the state-of-the-art. The code is available at \href{}{}.

* Submitted to TPAMI 
Access Paper or Ask Questions

Optimal translational-rotational invariant dictionaries for images

Sep 04, 2019
Davide Barbieri, Carlos Cabrelli, Eugenio Hernández, Ursula Molter

We provide the construction of a set of square matrices whose translates and rotates provide a Parseval frame that is optimal for approximating a given dataset of images. Our approach is based on abstract harmonic analysis techniques. Optimality is considered with respect to the quadratic error of approximation of the images in the dataset with their projection onto a linear subspace that is invariant under translations and rotations. In addition, we provide an elementary and fully self-contained proof of optimality, and the numerical results from datasets of natural images.

Access Paper or Ask Questions

A Domain Gap Aware Generative Adversarial Network for Multi-domain Image Translation

Oct 21, 2021
Wenju Xu, Guanghui Wang

Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning the inverse mapping introduces extra trainable parameters and it is unable to learn the inverse mapping for some domains. As a result, they are ineffective in the scenarios where (i) multiple visual image domains are involved; (ii) both structure and texture transformations are required; and (iii) semantic consistency is preserved. To solve these challenges, the paper proposes a unified model to translate images across multiple domains with significant domain gaps. Unlike previous models that constrain the generators with the ubiquitous cycle-consistency constraint to achieve the content similarity, the proposed model employs a perceptual self-regularization constraint. With a single unified generator, the model can maintain consistency over the global shapes as well as the local texture information across multiple domains. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and superior performance over state-of-the-art models. It is more effective in representing shape deformation in challenging mappings with significant dataset variation across multiple domains.

Access Paper or Ask Questions

Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database

Dec 30, 2019
Tong Zheng, Hirohisa Oda, Takayasu Moriya, Shota Nakamura, Masahiro Oda, Masaki Mori, Horitsugu Takabatake, Hiroshi Natori, Kensaku Mori

This paper newly introduces multi-modality loss function for GAN-based super-resolution that can maintain image structure and intensity on unpaired training dataset of clinical CT and micro CT volumes. Precise non-invasive diagnosis of lung cancer mainly utilizes 3D multidetector computed-tomography (CT) data. On the other hand, we can take micro CT images of resected lung specimen in 50 micro meter or higher resolution. However, micro CT scanning cannot be applied to living human imaging. For obtaining highly detailed information such as cancer invasion area from pre-operative clinical CT volumes of lung cancer patients, super-resolution (SR) of clinical CT volumes to $\mu$CT level might be one of substitutive solutions. While most SR methods require paired low- and high-resolution images for training, it is infeasible to obtain precisely paired clinical CT and micro CT volumes. We aim to propose unpaired SR approaches for clincial CT using micro CT images based on unpaired image translation methods such as CycleGAN or UNIT. Since clinical CT and micro CT are very different in structure and intensity, direct application of GAN-based unpaired image translation methods in super-resolution tends to generate arbitrary images. Aiming to solve this problem, we propose new loss function called multi-modality loss function to maintain the similarity of input images and corresponding output images in super-resolution task. Experimental results demonstrated that the newly proposed loss function made CycleGAN and UNIT to successfully perform SR of clinical CT images of lung cancer patients into micro CT level resolution, while original CycleGAN and UNIT failed in super-resolution.

* 6 pages, 2 figures 
Access Paper or Ask Questions

Image-to-Image Translation: Methods and Applications

Jan 21, 2021
Yingxue Pang, Jianxin Lin, Tao Qin, Zhibo Chen

Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations. I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, segmentation, style transfer, restoration, and pose estimation. In this paper, we provide an overview of the I2I works developed in recent years. We will analyze the key techniques of the existing I2I works and clarify the main progress the community has made. Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields.

* 19 pages, 17 figures 
Access Paper or Ask Questions