Alert button

"Image To Image Translation": models, code, and papers
Alert button

Deep Reinforcement Learning for Image-to-Image Translation

Sep 24, 2023
Xin Wang, Ziwei Luo, Jing Hu, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Siwei Lyu

Figure 1 for Deep Reinforcement Learning for Image-to-Image Translation
Figure 2 for Deep Reinforcement Learning for Image-to-Image Translation
Figure 3 for Deep Reinforcement Learning for Image-to-Image Translation
Figure 4 for Deep Reinforcement Learning for Image-to-Image Translation

Most existing Image-to-Image Translation (I2IT) methods generate images in a single run of a deep learning (DL) model. However, designing such a single-step model is always challenging, requiring a huge number of parameters and easily falling into bad global minimums and overfitting. In this work, we reformulate I2IT as a step-wise decision-making problem via deep reinforcement learning (DRL) and propose a novel framework that performs RL-based I2IT (RL-I2IT). The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image. Considering that it is challenging to handle high dimensional continuous state and action spaces in the conventional RL framework, we introduce meta policy with a new concept Plan to the standard Actor-Critic model, which is of a lower dimension than the original image and can facilitate the actor to generate a tractable high dimensional action. In the RL-I2IT framework, we also employ a task-specific auxiliary learning strategy to stabilize the training process and improve the performance of the corresponding task. Experiments on several I2IT tasks demonstrate the effectiveness and robustness of the proposed method when facing high-dimensional continuous action space problems.

Viaarxiv icon

Uncertainty Estimation in Contrast-Enhanced MR Image Translation with Multi-Axis Fusion

Nov 20, 2023
Ivo M. Baltruschat, Parvaneh Janbakhshi, Melanie Dohmen, Matthias Lenga

In recent years, deep learning has been applied to a wide range of medical imaging and image processing tasks. In this work, we focus on the estimation of epistemic uncertainty for 3D medical image-to-image translation. We propose a novel model uncertainty quantification method, Multi-Axis Fusion (MAF), which relies on the integration of complementary information derived from multiple views on volumetric image data. The proposed approach is applied to the task of synthesizing contrast enhanced T1-weighted images based on native T1, T2 and T2-FLAIR scans. The quantitative findings indicate a strong correlation ($\rho_{\text healthy} = 0.89$) between the mean absolute image synthetization error and the mean uncertainty score for our MAF method. Hence, we consider MAF as a promising approach to solve the highly relevant task of detecting synthetization failures at inference time.

Viaarxiv icon

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Sep 17, 2023
Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan

Figure 1 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 2 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 3 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation
Figure 4 for UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model.

Viaarxiv icon

H-Packer: Holographic Rotationally Equivariant Convolutional Neural Network for Protein Side-Chain Packing

Nov 15, 2023
Gian Marco Visani, William Galvin, Michael Neal Pun, Armita Nourmohammad

Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral $\chi$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.

* Accepted as a conference paper at MLCB 2023. 8 pages main body, 20 pages with appendix. 10 figures 
Viaarxiv icon

SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation

Nov 07, 2023
Iman Abbasnejad, Fabio Zambetta, Flora Salim, Timothy Wiley, Jeffrey Chan, Russell Gallagher, Ehsan Abbasnejad

Figure 1 for SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation
Figure 2 for SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation
Figure 3 for SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation
Figure 4 for SCONE-GAN: Semantic Contrastive learning-based Generative Adversarial Network for an end-to-end image translation

SCONE-GAN presents an end-to-end image translation, which is shown to be effective for learning to generate realistic and diverse scenery images. Most current image-to-image translation approaches are devised as two mappings: a translation from the source to target domain and another to represent its inverse. While successful in many applications, these approaches may suffer from generating trivial solutions with limited diversity. That is because these methods learn more frequent associations rather than the scene structures. To mitigate the problem, we propose SCONE-GAN that utilises graph convolutional networks to learn the objects dependencies, maintain the image structure and preserve its semantics while transferring images into the target domain. For more realistic and diverse image generation we introduce style reference image. We enforce the model to maximize the mutual information between the style image and output. The proposed method explicitly maximizes the mutual information between the related patches, thus encouraging the generator to produce more diverse images. We validate the proposed algorithm for image-to-image translation and stylizing outdoor images. Both qualitative and quantitative results demonstrate the effectiveness of our approach on four dataset.

* 9 pages, 5 figures 
Viaarxiv icon

Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation

Sep 22, 2023
Bonifaz Stuhr, Jürgen Brauer, Bernhard Schick, Jordi Gonzàlez

Figure 1 for Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation
Figure 2 for Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation
Figure 3 for Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation
Figure 4 for Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation

A common goal of unpaired image-to-image translation is to preserve content consistency between source images and translated images while mimicking the style of the target domain. Due to biases between the datasets of both domains, many methods suffer from inconsistencies caused by the translation process. Most approaches introduced to mitigate these inconsistencies do not constrain the discriminator, leading to an even more ill-posed training setup. Moreover, none of these approaches is designed for larger crop sizes. In this work, we show that masking the inputs of a global discriminator for both domains with a content-based mask is sufficient to reduce content inconsistencies significantly. However, this strategy leads to artifacts that can be traced back to the masking process. To reduce these artifacts, we introduce a local discriminator that operates on pairs of small crops selected with a similarity sampling strategy. Furthermore, we apply this sampling strategy to sample global input crops from the source and target dataset. In addition, we propose feature-attentive denormalization to selectively incorporate content-based statistics into the generator stream. In our experiments, we show that our method achieves state-of-the-art performance in photorealistic sim-to-real translation and weather translation and also performs well in day-to-night translation. Additionally, we propose the cKVD metric, which builds on the sKVD metric and enables the examination of translation quality at the class or category level.

* 24 pages, 22 figures, under review 
Viaarxiv icon

CycleGANAS: Differentiable Neural Architecture Search for CycleGAN

Nov 13, 2023
Taegun An, Changhee Joo

We develop a Neural Architecture Search (NAS) framework for CycleGAN that carries out unpaired image-to-image translation task. Extending previous NAS techniques for Generative Adversarial Networks (GANs) to CycleGAN is not straightforward due to the task difference and greater search space. We design architectures that consist of a stack of simple ResNet-based cells and develop a search method that effectively explore the large search space. We show that our framework, called CycleGANAS, not only effectively discovers high-performance architectures that either match or surpass the performance of the original CycleGAN, but also successfully address the data imbalance by individual architecture search for each translation direction. To our best knowledge, it is the first NAS result for CycleGAN and shed light on NAS for more complex structures.

Viaarxiv icon

Visible to Thermal image Translation for improving visual task in low light conditions

Nov 09, 2023
Md Azim Khan

Figure 1 for Visible to Thermal image Translation for improving visual task in low light conditions
Figure 2 for Visible to Thermal image Translation for improving visual task in low light conditions
Figure 3 for Visible to Thermal image Translation for improving visual task in low light conditions
Figure 4 for Visible to Thermal image Translation for improving visual task in low light conditions

Several visual tasks, such as pedestrian detection and image-to-image translation, are challenging to accomplish in low light using RGB images. Heat variation of objects in thermal images can be used to overcome this. In this work, an end-to-end framework, which consists of a generative network and a detector network, is proposed to translate RGB image into Thermal ones and compare generated thermal images with real data. We have collected images from two different locations using the Parrot Anafi Thermal drone. After that, we created a two-stream network, preprocessed, augmented, the image data, and trained the generator and discriminator models from scratch. The findings demonstrate that it is feasible to translate RGB training data to thermal data using GAN. As a result, thermal data can now be produced more quickly and affordably, which is useful for security and surveillance applications.

Viaarxiv icon

Few-shot Image Generation via Style Adaptation and Content Preservation

Nov 30, 2023
Xiaosheng He, Fan Yang, Fayao Liu, Guosheng Lin

Training a generative model with limited data (e.g., 10) is a very challenging task. Many works propose to fine-tune a pre-trained GAN model. However, this can easily result in overfitting. In other words, they manage to adapt the style but fail to preserve the content, where \textit{style} denotes the specific properties that defines a domain while \textit{content} denotes the domain-irrelevant information that represents diversity. Recent works try to maintain a pre-defined correspondence to preserve the content, however, the diversity is still not enough and it may affect style adaptation. In this work, we propose a paired image reconstruction approach for content preservation. We propose to introduce an image translation module to GAN transferring, where the module teaches the generator to separate style and content, and the generator provides training data to the translation module in return. Qualitative and quantitative experiments show that our method consistently surpasses the state-of-the-art methods in few shot setting.

Viaarxiv icon

Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images

Sep 14, 2023
Zhiyun Song, Penghui Du, Junpeng Yan, Kailu Li, Jianzhong Shou, Maode Lai, Yubo Fan, Yan Xu

Figure 1 for Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images
Figure 2 for Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images
Figure 3 for Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images
Figure 4 for Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images

Self-supervised pretraining attempts to enhance model performance by obtaining effective features from unlabeled data, and has demonstrated its effectiveness in the field of histopathology images. Despite its success, few works concentrate on the extraction of nucleus-level information, which is essential for pathologic analysis. In this work, we propose a novel nucleus-aware self-supervised pretraining framework for histopathology images. The framework aims to capture the nuclear morphology and distribution information through unpaired image-to-image translation between histopathology images and pseudo mask images. The generation process is modulated by both conditional and stochastic style representations, ensuring the reality and diversity of the generated histopathology images for pretraining. Further, an instance segmentation guided strategy is employed to capture instance-level information. The experiments on 7 datasets show that the proposed pretraining method outperforms supervised ones on Kather classification, multiple instance learning, and 5 dense-prediction tasks with the transfer learning protocol, and yields superior results than other self-supervised approaches on 8 semi-supervised tasks. Our project is publicly available at https://github.com/zhiyuns/UNITPathSSL.

Viaarxiv icon