Alert button

"Image To Image Translation": models, code, and papers
Alert button

DSI2I: Dense Style for Unpaired Image-to-Image Translation

Add code
Alert button
Dec 29, 2022
Baran Ozaydin, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for DSI2I: Dense Style for Unpaired Image-to-Image Translation
Figure 2 for DSI2I: Dense Style for Unpaired Image-to-Image Translation
Figure 3 for DSI2I: Dense Style for Unpaired Image-to-Image Translation
Figure 4 for DSI2I: Dense Style for Unpaired Image-to-Image Translation

Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar, without ground-truth input-translation pairs. Existing UEI2I methods represent style using either a global, image-level feature vector, or one vector per object instance/class but requiring knowledge of the scene semantics. Here, by contrast, we propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information. We then rely on perceptual and adversarial losses to disentangle our dense style and content representations, and exploit unsupervised cross-domain semantic correspondences to warp the exemplar style to the source content. We demonstrate the effectiveness of our method on two datasets using standard metrics together with a new localized style metric measuring style similarity in a class-wise manner. Our results evidence that the translations produced by our approach are more diverse and closer to the exemplars than those of the state-of-the-art methods while nonetheless preserving the source content.

A Generalized Framework for Critical Heat Flux Detection Using Unsupervised Image-to-Image Translation

Add code
Alert button
Dec 25, 2022
Firas Al-Hindawi, Tejaswi Soori, Han Hu, Md Mahfuzur Rahman Siddiquee, Hyunsoo Yoon, Teresa Wu, Ying Sun

Figure 1 for A Generalized Framework for Critical Heat Flux Detection Using Unsupervised Image-to-Image Translation
Figure 2 for A Generalized Framework for Critical Heat Flux Detection Using Unsupervised Image-to-Image Translation
Figure 3 for A Generalized Framework for Critical Heat Flux Detection Using Unsupervised Image-to-Image Translation
Figure 4 for A Generalized Framework for Critical Heat Flux Detection Using Unsupervised Image-to-Image Translation

This work proposes a framework developed to generalize Critical Heat Flux (CHF) detection classification models using an Unsupervised Image-to-Image (UI2I) translation model. The framework enables a typical classification model that was trained and tested on boiling images from domain A to predict boiling images coming from domain B that was never seen by the classification model. This is done by using the UI2I model to transform the domain B images to look like domain A images that the classification model is familiar with. Although CNN was used as the classification model and Fixed-Point GAN (FP-GAN) was used as the UI2I model, the framework is model agnostic. Meaning, that the framework can generalize any image classification model type, making it applicable to a variety of similar applications and not limited to the boiling crisis detection problem. It also means that the more the UI2I models advance, the better the performance of the framework.

* This work has been submitted to the Expert Systems With Applications Journal on Sep 25, 2022 

Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing

Add code
Alert button
Dec 07, 2022
Sarmad F. Ismael, Koray Kayabol, Erchan Aptoula

Figure 1 for Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing
Figure 2 for Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing
Figure 3 for Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing
Figure 4 for Unsupervised Domain Adaptation for Semantic Segmentation using One-shot Image-to-Image Translation via Latent Representation Mixing

Domain adaptation is one of the prominent strategies for handling both domain shift, that is widely encountered in large-scale land use/land cover map calculation, and the scarcity of pixel-level ground truth that is crucial for supervised semantic segmentation. Studies focusing on adversarial domain adaptation via re-styling source domain samples, commonly through generative adversarial networks, have reported varying levels of success, yet they suffer from semantic inconsistencies, visual corruptions, and often require a large number of target domain samples. In this letter, we propose a new unsupervised domain adaptation method for the semantic segmentation of very high resolution images, that i) leads to semantically consistent and noise-free images, ii) operates with a single target domain sample (i.e. one-shot) and iii) at a fraction of the number of parameters required from state-of-the-art methods. More specifically an image-to-image translation paradigm is proposed, based on an encoder-decoder principle where latent content representations are mixed across domains, and a perceptual network module and loss function is further introduced to enforce semantic consistency. Cross-city comparative experiments have shown that the proposed method outperforms state-of-the-art domain adaptation methods. Our source code will be available at \url{https://github.com/Sarmadfismael/LRM_I2I}.

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Add code
Alert button
Nov 22, 2022
Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel

Figure 1 for Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Figure 2 for Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Figure 3 for Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Figure 4 for Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts. However, a pivotal challenge in leveraging such models for real-world content creation tasks is providing users with control over the generated content. In this paper, we present a new framework that takes text-to-image synthesis to the realm of image-to-image translation -- given a guidance image and a target text prompt, our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text, while preserving the semantic layout of the source image. Specifically, we observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model. This results in a simple and effective approach, where features extracted from the guidance image are directly injected into the generation process of the target image, requiring no training or fine-tuning and applicable for both real or generated guidance images. We demonstrate high-quality results on versatile text-guided image translation tasks, including translating sketches, rough drawings and animations into realistic images, changing of the class and appearance of objects in a given image, and modifications of global qualities such as lighting and color.

Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation

Add code
Alert button
Jan 06, 2023
Hilmil Pradana, Minh-Son Dao, Koji Zettsu

Figure 1 for Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation
Figure 2 for Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation
Figure 3 for Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation
Figure 4 for Augmenting Ego-Vehicle for Traffic Near-Miss and Accident Classification Dataset using Manipulating Conditional Style Translation

To develop the advanced self-driving systems, many researchers are focusing to alert all possible traffic risk cases from closed-circuit television (CCTV) and dashboard-mounted cameras. Most of these methods focused on identifying frame-by-frame in which an anomaly has occurred, but they are unrealized, which road traffic participant can cause ego-vehicle leading into collision because of available annotation dataset only to detect anomaly on traffic video. Near-miss is one type of accident and can be defined as a narrowly avoided accident. However, there is no difference between accident and near-miss at the time before the accident happened, so our contribution is to redefine the accident definition and re-annotate the accident inconsistency on DADA-2000 dataset together with near-miss. By extending the start and end time of accident duration, our annotation can precisely cover all ego-motions during an incident and consistently classify all possible traffic risk accidents including near-miss to give more critical information for real-world driving assistance systems. The proposed method integrates two different components: conditional style translation (CST) and separable 3-dimensional convolutional neural network (S3D). CST architecture is derived by unsupervised image-to-image translation networks (UNIT) used for augmenting the re-annotation DADA-2000 dataset to increase the number of traffic risk accident videos and to generalize the performance of video classification model on different types of conditions while S3D is useful for video classification to prove dataset re-annotation consistency. In evaluation, the proposed method achieved a significant improvement result by 10.25% positive margin from the baseline model for accuracy on cross-validation analysis.

* International Conference on Digital Image Computing: Techniques and Applications (DICTA) 2022  
* 8 pages, conference 

Constraining Multi-scale Pairwise Features between Encoder and Decoder Using Contrastive Learning for Unpaired Image-to-Image Translation

Add code
Alert button
Nov 20, 2022
Xiuding Cai, Yaoyao Zhu, Dong Miao, Linjie Fu, Yu Yao

Figure 1 for Constraining Multi-scale Pairwise Features between Encoder and Decoder Using Contrastive Learning for Unpaired Image-to-Image Translation
Figure 2 for Constraining Multi-scale Pairwise Features between Encoder and Decoder Using Contrastive Learning for Unpaired Image-to-Image Translation
Figure 3 for Constraining Multi-scale Pairwise Features between Encoder and Decoder Using Contrastive Learning for Unpaired Image-to-Image Translation
Figure 4 for Constraining Multi-scale Pairwise Features between Encoder and Decoder Using Contrastive Learning for Unpaired Image-to-Image Translation

Contrastive learning (CL) has shown great potential in image-to-image translation (I2I). Current CL-based I2I methods usually re-exploit the encoder of the generator to maximize the mutual information between the input and generated images, which does not exert an active effect on the decoder part. In addition, though negative samples play a crucial role in CL, most existing methods adopt a random sampling strategy, which may be less effective. In this paper, we rethink the CL paradigm in the unpaired I2I tasks from two perspectives and propose a new one-sided image translation framework called EnCo. First, we present an explicit constraint on the multi-scale pairwise features between the encoder and decoder of the generator to guarantee the semantic consistency of the input and generated images. Second, we propose a discriminative attention-guided negative sampling strategy to replace the random negative sampling, which significantly improves the performance of the generative model with an almost negligible computational overhead. Compared with existing methods, EnCo acts more effective and efficient. Extensive experiments on several popular I2I datasets demonstrate the effectiveness and advantages of our proposed approach, and we achieve several state-of-the-art compared to previous methods.

* 16 pages, 10 figures 

DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation

Add code
Alert button
Dec 30, 2022
Xinyuan Chen, Yangchen Xie, Li Sun, Yue Lu

Figure 1 for DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation
Figure 2 for DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation
Figure 3 for DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation
Figure 4 for DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation

Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.

* arXiv admin note: substantial text overlap with arXiv:2104.03064 

Artifact Removal in Histopathology Images

Add code
Alert button
Dec 16, 2022
Cameron Dahan, Stergios Christodoulidis, Maria Vakalopoulou, Joseph Boyd

Figure 1 for Artifact Removal in Histopathology Images
Figure 2 for Artifact Removal in Histopathology Images
Figure 3 for Artifact Removal in Histopathology Images
Figure 4 for Artifact Removal in Histopathology Images

In the clinical setting of histopathology, whole-slide image (WSI) artifacts frequently arise, distorting regions of interest, and having a pernicious impact on WSI analysis. Image-to-image translation networks such as CycleGANs are in principle capable of learning an artifact removal function from unpaired data. However, we identify a surjection problem with artifact removal, and propose an weakly-supervised extension to CycleGAN to address this. We assemble a pan-cancer dataset comprising artifact and clean tiles from the TCGA database. Promising results highlight the soundness of our method.

* Corrected typos, small modification of Figure 1 (+ reflected in Section 2.1), results unchanged 

Vector Quantized Image-to-Image Translation

Add code
Alert button
Jul 27, 2022
Yu-Jie Chen, Shin-I Cheng, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee

Figure 1 for Vector Quantized Image-to-Image Translation
Figure 2 for Vector Quantized Image-to-Image Translation
Figure 3 for Vector Quantized Image-to-Image Translation
Figure 4 for Vector Quantized Image-to-Image Translation

Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the conditional contexts. In this work, we propose introducing the vector quantization technique into the image-to-image translation framework. The vector quantized content representation can facilitate not only the translation, but also the unconditional distribution shared among different domains. Meanwhile, along with the disentangled style representation, the proposed method further enables the capability of image extension with flexibility in both intra- and inter-domains. Qualitative and quantitative experiments demonstrate that our framework achieves comparable performance to the state-of-the-art image-to-image translation and image extension methods. Compared to methods for individual tasks, the proposed method, as a unified framework, unleashes applications combining image-to-image translation, unconditional generation, and image extension altogether. For example, it provides style variability for image generation and extension, and equips image-to-image translation with further extension capabilities.