Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Adversarial Uni- and Multi-modal Stream Networks for Multimodal Image Registration

Jul 06, 2020
Zhe Xu, Jie Luo, Jiangpeng Yan, Ritvik Pulya, Xiu Li, William Wells III, Jayender Jagadeesan

Deformable image registration between Computed Tomography (CT) images and Magnetic Resonance (MR) imaging is essential for many image-guided therapies. In this paper, we propose a novel translation-based unsupervised deformable image registration method. Distinct from other translation-based methods that attempt to convert the multimodal problem (e.g., CT-to-MR) into a unimodal problem (e.g., MR-to-MR) via image-to-image translation, our method leverages the deformation fields estimated from both: (i) the translated MR image and (ii) the original CT image in a dual-stream fashion, and automatically learns how to fuse them to achieve better registration performance. The multimodal registration network can be effectively trained by computationally efficient similarity metrics without any ground-truth deformation. Our method has been evaluated on two clinical datasets and demonstrates promising results compared to state-of-the-art traditional and learning-based methods.

* accepted by MICCAI 2020 

Thermal Infrared Image Colorization for Nighttime Driving Scenes with Top-Down Guided Attention

Apr 29, 2021
Fuya Luo, Yunhan Li, Guang Zeng, Peng Peng, Gang Wang, Yongjie Li

Benefitting from insensitivity to light and high penetration of foggy environments, infrared cameras are widely used for sensing in nighttime traffic scenes. However, the low contrast and lack of chromaticity of thermal infrared (TIR) images hinder the human interpretation and portability of high-level computer vision algorithms. Colorization to translate a nighttime TIR image into a daytime color (NTIR2DC) image may be a promising way to facilitate nighttime scene perception. Despite recent impressive advances in image translation, semantic encoding entanglement and geometric distortion in the NTIR2DC task remain under-addressed. Hence, we propose a toP-down attEntion And gRadient aLignment based GAN, referred to as PearlGAN. A top-down guided attention module and an elaborate attentional loss are first designed to reduce the semantic encoding ambiguity during translation. Then, a structured gradient alignment loss is introduced to encourage edge consistency between the translated and input images. In addition, pixel-level annotation is carried out on a subset of FLIR and KAIST datasets to evaluate the semantic preservation performance of multiple translation methods. Furthermore, a new metric is devised to evaluate the geometric consistency in the translation process. Extensive experiments demonstrate the superiority of the proposed PearlGAN over other image translation methods for the NTIR2DC task. The source code and labeled segmentation masks will be available at \url{}.

* A Manuscript Submitted to IEEE Transactions on Intelligent Transpotation Systems 

Video-to-Video Translation for Visual Speech Synthesis

May 28, 2019
Michail C. Doukas, Viktoriia Sharmanska, Stefanos Zafeiriou

Despite remarkable success in image-to-image translation that celebrates the advancements of generative adversarial networks (GANs), very limited attempts are known for video domain translation. We study the task of video-to-video translation in the context of visual speech generation, where the goal is to transform an input video of any spoken word to an output video of a different word. This is a multi-domain translation, where each word forms a domain of videos uttering this word. Adaptation of the state-of-the-art image-to-image translation model (StarGAN) to this setting falls short with a large vocabulary size. Instead we propose to use character encodings of the words and design a novel character-based GANs architecture for video-to-video translation called Visual Speech GAN (ViSpGAN). We are the first to demonstrate video-to-video translation with a vocabulary of 500 words.


BargainNet: Background-Guided Domain Translation for Image Harmonization

Sep 19, 2020
Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang

Image composition is a fundamental operation in image editing field. However, unharmonious foreground and background downgrade the quality of composite image. Image harmonization, which adjusts the foreground to improve the consistency, is an essential yet challenging task. Previous deep learning based methods mainly focus on directly learning the mapping from composite image to real image, while ignoring the crucial guidance role that background plays. In this work, with the assumption that the foreground needs to be translated to the same domain as background, we formulate image harmonization task as background-guided domain translation. Therefore, we propose an image harmonization network with a novel domain code extractor and well-tailored triplet losses, which could capture the background domain information to guide the foreground harmonization. Extensive experiments on the existing image harmonization benchmark demonstrate the effectiveness of our proposed method.


Multi-Modality Image Inpainting using Generative Adversarial Networks

Jun 22, 2022
Aref Abedjooy, Mehran Ebrahimi

Deep learning techniques, especially Generative Adversarial Networks (GANs) have significantly improved image inpainting and image-to-image translation tasks over the past few years. To the best of our knowledge, the problem of combining the image inpainting task with the multi-modality image-to-image translation remains intact. In this paper, we propose a model to address this problem. The model will be evaluated on combined night-to-day image translation and inpainting, along with promising qualitative and quantitative results.

* to be published in the Proceedings of 26th Int'l Conf on Image Processing, Computer Vision, & Pattern Recognition (IPCV), July 2022 

Adversarial Image Translation: Unrestricted Adversarial Examples in Face Recognition Systems

May 27, 2019
Kazuya Kakizaki, Kosuke Yoshida

Thanks to recent advances in Deep Neural Networks (DNNs), face recognition systems have achieved high accuracy in classification of a large number of face images. However, recent works demonstrate that DNNs could be vulnerable to adversarial examples and raise concerns about robustness of face recognition systems. In particular adversarial examples that are not restricted to small perturbations could be more serious risks since conventional certified defenses might be ineffective against them. To shed light on the vulnerability to this type of adversarial examples, we propose a flexible and efficient method to generate unrestricted adversarial examples using image translation techniques. Our method enables us to translate a source image into any desired facial appearance with large perturbations so that target face recognition systems could be deceived. Through our experiments, we demonstrate that our method achieves about 90% and 30% attack success rates under a white- and black-box setting, respectively. We also illustrate that our translated images are perceptually realistic and maintain personal identity while the perturbations are large enough to bypass certified defenses.

* Kazuya Kakizaki and Kosuke Yoshida share equal contributions 

Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation

Oct 11, 2021
Qiusheng Huang, Zhilin Zheng, Xueqi Hu, Li Sun, Qingli Li

The image-to-image translation (I2IT) model takes a target label or a reference image as the input, and changes a source into the specified target domain. The two types of synthesis, either label- or reference-based, have substantial differences. Particularly, the label-based synthesis reflects the common characteristics of the target domain, and the reference-based shows the specific style similar to the reference. This paper intends to bridge the gap between them in the task of multi-attribute I2IT. We design the label- and reference-based encoding modules (LEM and REM) to compare the domain differences. They first transfer the source image and target label (or reference) into a common embedding space, by providing the opposite directions through the attribute difference vector. Then the two embeddings are simply fused together to form the latent code S_rand (or S_ref), reflecting the domain style differences, which is injected into each layer of the generator by SPADE. To link LEM and REM, so that two types of results benefit each other, we encourage the two latent codes to be close, and set up the cycle consistency between the forward and backward translations on them. Moreover, the interpolation between the S_rand and S_ref is also used to synthesize an extra image. Experiments show that label- and reference-based synthesis are indeed mutually promoted, so that we can have the diverse results from LEM, and high quality results with the similar style of the reference.

* accepted by ICCV 2021 

Protecting Against Image Translation Deepfakes by Leaking Universal Perturbations from Black-Box Neural Networks

Jun 11, 2020
Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

In this work, we develop efficient disruptions of black-box image translation deepfake generation systems. We are the first to demonstrate black-box deepfake generation disruption by presenting image translation formulations of attacks initially proposed for classification models. Nevertheless, a naive adaptation of classification black-box attacks results in a prohibitive number of queries for image translation systems in the real-world. We present a frustratingly simple yet highly effective algorithm Leaking Universal Perturbations (LUP), that significantly reduces the number of queries needed to attack an image. LUP consists of two phases: (1) a short leaking phase where we attack the network using traditional black-box attacks and gather information on successful attacks on a small dataset and (2) and an exploitation phase where we leverage said information to subsequently attack the network with improved efficiency. Our attack reduces the total number of queries necessary to attack GANimation and StarGAN by 30%.


Depth Estimation from Single-shot Monocular Endoscope Image Using Image Domain Adaptation And Edge-Aware Depth Estimation

Jan 12, 2022
Masahiro Oda, Hayato Itoh, Kiyohito Tanaka, Hirotsugu Takabatake, Masaki Mori, Hiroshi Natori, Kensaku Mori

We propose a depth estimation method from a single-shot monocular endoscopic image using Lambertian surface translation by domain adaptation and depth estimation using multi-scale edge loss. We employ a two-step estimation process including Lambertian surface translation from unpaired data and depth estimation. The texture and specular reflection on the surface of an organ reduce the accuracy of depth estimations. We apply Lambertian surface translation to an endoscopic image to remove these texture and reflections. Then, we estimate the depth by using a fully convolutional network (FCN). During the training of the FCN, improvement of the object edge similarity between an estimated image and a ground truth depth image is important for getting better results. We introduced a muti-scale edge loss function to improve the accuracy of depth estimation. We quantitatively evaluated the proposed method using real colonoscopic images. The estimated depth values were proportional to the real depth values. Furthermore, we applied the estimated depth images to automated anatomical location identification of colonoscopic images using a convolutional neural network. The identification accuracy of the network improved from 69.2% to 74.1% by using the estimated depth images.

* Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2021 
* Accepted paper as an oral presentation at Joint MICCAI workshop 2021, AE-CAI/CARE/OR2.0 

XGAN: Unsupervised Image-to-Image Translation for Many-to-Many Mappings

Jul 10, 2018
Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Mosseri, Forrester Cole, Kevin Murphy

Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN ("Cross-GAN"), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset, CartoonSet, we collected for this purpose is publicly available at as a new benchmark for semantic style transfer.

* Domain Adaptation for Visual Understanding at ICML'18