Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Unsupervised Image Translation using Adversarial Networks for Improved Plant Disease Recognition

Sep 26, 2019
Haseeb Nazki, Sook Yoon, Alvaro Fuentes, Dong Sun Park

Acquisition of data in task-specific applications of machine learning like plant disease recognition is a costly endeavor owing to the requirements of professional human diligence and time constraints. In this paper, we present a simple pipeline that uses GANs in an unsupervised image translation environment to improve learning with respect to the data distribution in a plant disease dataset, reducing the partiality introduced by acute class imbalance and hence shifting the classification decision boundary towards better performance. The empirical analysis of our method is demonstrated on a limited dataset of 2789 tomato plant disease images, highly corrupted with an imbalance in the 9 disease categories. First, we extend the state of the art for the GAN-based image-to-image translation method by enhancing the perceptual quality of the generated images and preserving the semantics. We introduce AR-GAN, where in addition to the adversarial loss, our synthetic image generator optimizes on Activation Reconstruction loss (ARL) function that optimizes feature activations against the natural image. We present visually more compelling synthetic images in comparison to most prominent existing models and evaluate the performance of our GAN framework in terms of various datasets and metrics. Second, we evaluate the performance of a baseline convolutional neural network classifier for improved recognition using the resulting synthetic samples to augment our training set and compare it with the classical data augmentation scheme. We observe a significant improvement in classification accuracy (+5.2%) using generated synthetic samples as compared to (+0.8%) increase using classic augmentation in an equal class distribution environment.

* 20 pages, 11 figures, 3 tables, article under review 

Contextual-based Image Inpainting: Infer, Match, and Translate

Jul 25, 2018
Yuhang Song, Chao Yang, Zhe Lin, Xiaofeng Liu, Qin Huang, Hao Li, C. -C. Jay Kuo

We study the task of image inpainting, which is to fill in the missing region of an incomplete image with plausible contents. To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. In order to overcome the difficulty to directly learn the distribution of high-dimensional image data, we divide the task into inference and translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide the propagation of local textures from the boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions in much smaller space and hence easier to train. We evaluate our method on several public datasets and show that we generate results of better visual quality than previous state-of-the-art methods.

* ECCV 2018 camera ready 

Unified Attentional Generative Adversarial Network for Brain Tumor Segmentation From Multimodal Unpaired Images

Jul 08, 2019
Wenguang Yuan, Jia Wei, Jiabing Wang, Qianli Ma, Tolga Tasdizen

In medical applications, the same anatomical structures may be observed in multiple modalities despite the different image characteristics. Currently, most deep models for multimodal segmentation rely on paired registered images. However, multimodal paired registered images are difficult to obtain in many cases. Therefore, developing a model that can segment the target objects from different modalities with unpaired images is significant for many clinical applications. In this work, we propose a novel two-stream translation and segmentation unified attentional generative adversarial network (UAGAN), which can perform any-to-any image modality translation and segment the target objects simultaneously in the case where two or more modalities are available. The translation stream is used to capture modality-invariant features of the target anatomical structures. In addition, to focus on segmentation-related features, we add attentional blocks to extract valuable features from the translation stream. Experiments on three-modality brain tumor segmentation indicate that UAGAN outperforms the existing methods in most cases.

* 9 pages, 4 figures, Accepted by MICCAI2019 

Zero-Pair Image to Image Translation using Domain Conditional Normalization

Nov 11, 2020
Samarth Shukla, Andrés Romero, Luc Van Gool, Radu Timofte

In this paper, we propose an approach based on domain conditional normalization (DCN) for zero-pair image-to-image translation, i.e., translating between two domains which have no paired training data available but each have paired training data with a third domain. We employ a single generator which has an encoder-decoder structure and analyze different implementations of domain conditional normalization to obtain the desired target domain output. The validation benchmark uses RGB-depth pairs and RGB-semantic pairs for training and compares performance for the depth-semantic translation task. The proposed approaches improve in qualitative and quantitative terms over the compared methods, while using much fewer parameters. Code available at

* Paper accepted for publication at WACV 2021 

Remote Sensing Image Translation via Style-Based Recalibration Module and Improved Style Discriminator

Mar 29, 2021
Tiange Zhang, Feng Gao, Junyu Dong, Qian Du

Existing remote sensing change detection methods are heavily affected by seasonal variation. Since vegetation colors are different between winter and summer, such variations are inclined to be falsely detected as changes. In this letter, we proposed an image translation method to solve the problem. A style-based recalibration module is introduced to capture seasonal features effectively. Then, a new style discriminator is designed to improve the translation performance. The discriminator can not only produce a decision for the fake or real sample, but also return a style vector according to the channel-wise correlations. Extensive experiments are conducted on season-varying dataset. The experimental results show that the proposed method can effectively perform image translation, thereby consistently improving the season-varying image change detection performance. Our codes and data are available at

* Accepted by IEEE Geoscience and Remote Sensing Letters, Code: 

High Resolution Solar Image Generation using Generative Adversarial Networks

Jun 07, 2021
Ankan Dash, Junyi Ye, Guiling Wang

We applied Deep Learning algorithm known as Generative Adversarial Networks (GANs) to perform solar image-to-image translation. That is, from Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager(HMI) line of sight magnetogram images to SDO/Atmospheric Imaging Assembly(AIA) 0304-{\AA} images. The Ultraviolet(UV)/Extreme Ultraviolet(EUV) observations like the SDO/AIA0304-{\AA} images were only made available to scientists in the late 1990s even though the magenetic field observations like the SDO/HMI have been available since the 1970s. Therefore by leveraging Deep Learning algorithms like GANs we can give scientists access to complete datasets for analysis. For generating high resolution solar images we use the Pix2PixHD and Pix2Pix algorithms. The Pix2PixHD algorithm was specifically designed for high resolution image generation tasks, and the Pix2Pix algorithm is by far the most widely used image to image translation algorithm. For training and testing we used the data for the year 2012, 2013 and 2014. The results show that our deep learning models are capable of generating high resolution(1024 x 1024 pixels) AIA0304 images from HMI magnetograms. Specifically, the pixel-to-pixel Pearson Correlation Coefficient of the images generated by Pix2PixHD and original images is as high as 0.99. The number is 0.962 if Pix2Pix is used to generate images. The results we get for our Pix2PixHD model is better than the results obtained by previous works done by others to generate AIA0304 images. Thus, we can use these models to generate AIA0304 images when the AIA0304 data is not available which can be used for understanding space weather and giving researchers the capability to predict solar events such as Solar Flares and Coronal Mass Ejections. As far as we know, our work is the first attempt to leverage Pix2PixHD algorithm for SDO/HMI to SDO/AIA0304 image-to-image translation.

* 10 pages, 3 figures 

Bi-Directional Domain Translation for Zero-Shot Sketch-Based Image Retrieval

Nov 29, 2019
Jiangtong Li, Zhixin Ling, Li Niu, Liqing Zhang

The goal of Sketch-Based Image Retrieval (SBIR) is using free-hand sketches to retrieve images of the same category from a natural image gallery. However, SBIR requires all categories to be seen during training, which cannot be guaranteed in real-world applications. So we investigate more challenging Zero-Shot SBIR (ZS-SBIR), in which test categories do not appear in the training stage. Traditional SBIR methods are prone to be category-based retrieval and cannot generalize well from seen categories to unseen ones. In contrast, we disentangle image features into structure features and appearance features to facilitate structure-based retrieval. To assist feature disentanglement and take full advantage of disentangled information, we propose a Bi-directional Domain Translation (BDT) framework for ZS-SBIR, in which the image domain and sketch domain can be translated to each other through disentangled structure and appearance features. Finally, we perform retrieval in both structure feature space and image feature space. Extensive experiments demonstrate that our proposed approach remarkably outperforms state-of-the-art approaches by about 8% on the Sketchy dataset and over 5% on the TU-Berlin dataset.


Semantically Adaptive Image-to-image Translation for Domain Adaptation of Semantic Segmentation

Sep 02, 2020
Luigi Musto, Andrea Zinelli

Domain shift is a very challenging problem for semantic segmentation. Any model can be easily trained on synthetic data, where images and labels are artificially generated, but it will perform poorly when deployed on real environments. In this paper, we address the problem of domain adaptation for semantic segmentation of street scenes. Many state-of-the-art approaches focus on translating the source image while imposing that the result should be semantically consistent with the input. However, we advocate that the image semantics can also be exploited to guide the translation algorithm. To this end, we rethink the generative model to enforce this assumption and strengthen the connection between pixel-level and feature-level domain alignment. We conduct extensive experiments by training common semantic segmentation models with our method and show that the results we obtain on the synthetic-to-real benchmarks surpass the state-of-the-art.

* Paper will appear on BMVC 2020 

High-throughput lensless whole slide imaging via continuous height-varying modulation of tilted sensor

Sep 28, 2021
Shaowei Jiang, Chengfei Guo, Patrick Hu, Derek Hu, Pengming Song, Tianbo Wang, Zichao Bian, Zibang Zhang, Guoan Zheng

We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffraction images for reconstruction. Since the axial distance between the specimen and the sensor varies at different lateral positions, laterally translating the specimen effectively introduces defocus multi-height measurements while eliminating axial scanning. Lateral translation further introduces sub-pixel shift for pixel super-resolution imaging and naturally expands the field of view for rapid whole slide imaging. We show that the equivalent height variation can be precisely estimated from the lateral shift of the specimen, thereby addressing the challenge of precise axial positioning in conventional multi-height phase retrieval. Using a sensor with a 1.67-micron pixel size, our low-cost and field-portable prototype can resolve 690-nm linewidth on the resolution target. We show that a whole slide image of a blood smear with a 120-mm^2 field of view can be acquired in 18 seconds. We also demonstrate accurate automatic white blood cell counting from the recovered image. The reported approach may provide a turnkey solution for addressing point-of-care- and telemedicine-related challenges.


Layout-to-Image Translation with Double Pooling Generative Adversarial Networks

Aug 29, 2021
Hao Tang, Nicu Sebe

In this paper, we address the task of layout-to-image translation, which aims to translate an input semantic layout to a realistic image. One open challenge widely observed in existing methods is the lack of effective semantic constraints during the image translation process, leading to models that cannot preserve the semantic information and ignore the semantic dependencies within the same object. To address this issue, we propose a novel Double Pooing GAN (DPGAN) for generating photo-realistic and semantically-consistent results from the input layout. We also propose a novel Double Pooling Module (DPM), which consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape Pooling Module (RPM). Specifically, SPM aims to capture short-range semantic dependencies of the input layout with different spatial scales, while RPM aims to capture long-range semantic dependencies from both horizontal and vertical directions. We then effectively fuse both outputs of SPM and RPM to further enlarge the receptive field of our generator. Extensive experiments on five popular datasets show that the proposed DPGAN achieves better results than state-of-the-art methods. Finally, both SPM and SPM are general and can be seamlessly integrated into any GAN-based architectures to strengthen the feature representation. The code is available at

* Accepted to TIP