Alert button

"Image To Image Translation": models, code, and papers
Alert button

3D-Aware Multi-Class Image-to-Image Translation with NeRFs

Mar 27, 2023
Senmao Li, Joost van de Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang

Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multi-class image-to-image (3D-aware I2I) translation. Naively using 2D-I2I translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multi-class I2I translation, we decouple this learning process into a multi-class 3D-aware GAN step and a 3D-aware I2I translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multi-class 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware I2I translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In extensive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware I2I translation with multi-view consistency.

* Accepted by CVPR2023 

Guided Image-to-Image Translation by Discriminator-Generator Communication

Mar 07, 2023
Yuanjiang Cao, Lina Yao, Le Pan, Quan Z. Sheng, Xiaojun Chang

Figure 1 for Guided Image-to-Image Translation by Discriminator-Generator Communication
Figure 2 for Guided Image-to-Image Translation by Discriminator-Generator Communication
Figure 3 for Guided Image-to-Image Translation by Discriminator-Generator Communication
Figure 4 for Guided Image-to-Image Translation by Discriminator-Generator Communication

The goal of Image-to-image (I2I) translation is to transfer an image from a source domain to a target domain, which has recently drawn increasing attention. One major branch of this research is to formulate I2I translation based on Generative Adversarial Network (GAN). As a zero-sum game, GAN can be reformulated as a Partially-observed Markov Decision Process (POMDP) for generators, where generators cannot access full state information of their environments. This formulation illustrates the information insufficiency in the GAN training. To mitigate this problem, we propose to add a communication channel between discriminators and generators. We explore multiple architecture designs to integrate the communication mechanism into the I2I translation framework. To validate the performance of the proposed approach, we have conducted extensive experiments on various benchmark datasets. The experimental results confirm the superiority of our proposed method.

Exploring the Power of Generative Deep Learning for Image-to-Image Translation and MRI Reconstruction: A Cross-Domain Review

Mar 16, 2023
Yuda Bi

Figure 1 for Exploring the Power of Generative Deep Learning for Image-to-Image Translation and MRI Reconstruction: A Cross-Domain Review
Figure 2 for Exploring the Power of Generative Deep Learning for Image-to-Image Translation and MRI Reconstruction: A Cross-Domain Review
Figure 3 for Exploring the Power of Generative Deep Learning for Image-to-Image Translation and MRI Reconstruction: A Cross-Domain Review
Figure 4 for Exploring the Power of Generative Deep Learning for Image-to-Image Translation and MRI Reconstruction: A Cross-Domain Review

Deep learning has become a prominent computational modeling tool in the areas of computer vision and image processing in recent years. This research comprehensively analyzes the different deep-learning methods used for image-to-image translation and reconstruction in the natural and medical imaging domains. We examine the famous deep learning frameworks, such as convolutional neural networks and generative adversarial networks, and their variants, delving into the fundamental principles and difficulties of each. In the field of natural computer vision, we investigate the development and extension of various deep-learning generative models. In comparison, we investigate the possible applications of deep learning to generative medical imaging problems, including medical image translation, MRI reconstruction, and multi-contrast MRI synthesis. This thorough review provides scholars and practitioners in the areas of generative computer vision and medical imaging with useful insights for summarizing past works and getting insight into future research paths.

Pseudo Supervised Metrics: Evaluating Unsupervised Image to Image Translation Models In Unsupervised Cross-Domain Classification Frameworks

Mar 18, 2023
Firas Al-Hindawi, Md Mahfuzur Rahman Siddiquee, Teresa Wu, Han Hu, Ying Sun

Figure 1 for Pseudo Supervised Metrics: Evaluating Unsupervised Image to Image Translation Models In Unsupervised Cross-Domain Classification Frameworks
Figure 2 for Pseudo Supervised Metrics: Evaluating Unsupervised Image to Image Translation Models In Unsupervised Cross-Domain Classification Frameworks
Figure 3 for Pseudo Supervised Metrics: Evaluating Unsupervised Image to Image Translation Models In Unsupervised Cross-Domain Classification Frameworks
Figure 4 for Pseudo Supervised Metrics: Evaluating Unsupervised Image to Image Translation Models In Unsupervised Cross-Domain Classification Frameworks

The ability to classify images accurately and efficiently is dependent on having access to large labeled datasets and testing on data from the same domain that the model is trained on. Classification becomes more challenging when dealing with new data from a different domain, where collecting a large labeled dataset and training a new classifier from scratch is time-consuming, expensive, and sometimes infeasible or impossible. Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image (UI2I) translation models to translate an input image from the unlabeled domain to the labeled domain. The problem with these unsupervised models lies in their unsupervised nature. For lack of annotations, it is not possible to use the traditional supervised metrics to evaluate these translation models to pick the best-saved checkpoint model. In this paper, we introduce a new method called Pseudo Supervised Metrics that was designed specifically to support cross-domain classification applications contrary to other typically used metrics such as the FID which was designed to evaluate the model in terms of the quality of the generated image from a human-eye perspective. We show that our metric not only outperforms unsupervised metrics such as the FID, but is also highly correlated with the true supervised metrics, robust, and explainable. Furthermore, we demonstrate that it can be used as a standard metric for future research in this field by applying it to a critical real-world problem (the boiling crisis problem).

* arXiv admin note: text overlap with arXiv:2212.09107 

ACE: Zero-Shot Image to Image Translation via Pretrained Auto-Contrastive-Encoder

Feb 22, 2023
Sihan Xu, Zelong Jiang, Ruisi Liu, Kaikai Yang, Zhijie Huang

Figure 1 for ACE: Zero-Shot Image to Image Translation via Pretrained Auto-Contrastive-Encoder
Figure 2 for ACE: Zero-Shot Image to Image Translation via Pretrained Auto-Contrastive-Encoder
Figure 3 for ACE: Zero-Shot Image to Image Translation via Pretrained Auto-Contrastive-Encoder
Figure 4 for ACE: Zero-Shot Image to Image Translation via Pretrained Auto-Contrastive-Encoder

Image-to-image translation is a fundamental task in computer vision. It transforms images from one domain to images in another domain so that they have particular domain-specific characteristics. Most prior works train a generative model to learn the mapping from a source domain to a target domain. However, learning such mapping between domains is challenging because data from different domains can be highly unbalanced in terms of both quality and quantity. To address this problem, we propose a new approach to extract image features by learning the similarities and differences of samples within the same data distribution via a novel contrastive learning framework, which we call Auto-Contrastive-Encoder (ACE). ACE learns the content code as the similarity between samples with the same content information and different style perturbations. The design of ACE enables us to achieve zero-shot image-to-image translation with no training on image translation tasks for the first time. Moreover, our learning method can learn the style features of images on different domains effectively. Consequently, our model achieves competitive results on multimodal image translation tasks with zero-shot learning as well. Additionally, we demonstrate the potential of our method in transfer learning. With fine-tuning, the quality of translated images improves in unseen domains. Even though we use contrastive learning, all of our training can be performed on a single GPU with the batch size of 8.

Zero-shot Image-to-Image Translation

Feb 06, 2023
Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu

Figure 1 for Zero-shot Image-to-Image Translation
Figure 2 for Zero-shot Image-to-Image Translation
Figure 3 for Zero-shot Image-to-Image Translation
Figure 4 for Zero-shot Image-to-Image Translation

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.

* website: https://pix2pixzero.github.io/ 

OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing

Feb 09, 2023
Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob M. Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang

Figure 1 for OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing
Figure 2 for OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing
Figure 3 for OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing
Figure 4 for OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing

Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as regularization by enhancing (RE). We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.

* Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023) 

Training-free Style Transfer Emerges from h-space in Diffusion models

Mar 27, 2023
Jaeseok Jeong, Mingi Kwon, Youngjung Uh

Diffusion models (DMs) synthesize high-quality images in various domains. However, controlling their generative process is still hazy because the intermediate variables in the process are not rigorously studied. Recently, StyleCLIP-like editing of DMs is found in the bottleneck of the U-Net, named $h$-space. In this paper, we discover that DMs inherently have disentangled representations for content and style of the resulting images: $h$-space contains the content and the skip connections convey the style. Furthermore, we introduce a principled way to inject content of one image to another considering progressive nature of the generative process. Briefly, given the original generative process, 1) the feature of the source content should be gradually blended, 2) the blended feature should be normalized to preserve the distribution, 3) the change of skip connections due to content injection should be calibrated. Then, the resulting image has the source content with the style of the original image just like image-to-image translation. Interestingly, injecting contents to styles of unseen domains produces harmonization-like style transfer. To the best of our knowledge, our method introduces the first training-free feed-forward style transfer only with an unconditional pretrained frozen generative network. The code is available at https://curryjung.github.io/DiffStyle/.

Unsupervised Domain Adaptation for MRI Volume Segmentation and Classification Using Image-to-Image Translation

Feb 16, 2023
Satoshi Kondo, Satoshi Kasai

Figure 1 for Unsupervised Domain Adaptation for MRI Volume Segmentation and Classification Using Image-to-Image Translation

Unsupervised domain adaptation is a type of domain adaptation and exploits labeled data from the source domain and unlabeled data from the target one. In the Cross-Modality Domain Adaptation for Medical Image Segmenta-tion challenge (crossMoDA2022), contrast enhanced T1 MRI volumes for brain are provided as the source domain data, and high-resolution T2 MRI volumes are provided as the target domain data. The crossMoDA2022 challenge contains two tasks, segmentation of vestibular schwannoma (VS) and cochlea, and clas-sification of VS with Koos grade. In this report, we presented our solution for the crossMoDA2022 challenge. We employ an image-to-image translation method for unsupervised domain adaptation and residual U-Net the segmenta-tion task. We use SVM for the classification task. The experimental results show that the mean DSC and ASSD are 0.614 and 2.936 for the segmentation task and MA-MAE is 0.84 for the classification task.