Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

Oct 29, 2020
Nick Lawrence, Mingren Shen, Ruiqi Yin, Cloris Feng, Dane Morgan

The use of accurate scanning transmission electron microscopy (STEM) image simulation methods require large computation times that can make their use infeasible for the simulation of many images. Other simulation methods based on linear imaging models, such as the convolution method, are much faster but are too inaccurate to be used in application. In this paper, we explore deep learning models that attempt to translate a STEM image produced by the convolution method to a prediction of the high accuracy multislice image. We then compare our results to those of regression methods. We find that using the deep learning model Generative Adversarial Network (GAN) provides us with the best results and performs at a similar accuracy level to previous regression models on the same dataset. Codes and data for this project can be found in this GitHub repository,

Access Paper or Ask Questions

Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation

Apr 26, 2020
Aji Resindra Widya, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

Gastric endoscopy is a standard clinical process that enables medical practitioners to diagnose various lesions inside a patient's stomach. If any lesion is found, it is very important to perceive the location of the lesion relative to the global view of the stomach. Our previous research showed that this could be addressed by reconstructing the whole stomach shape from chromoendoscopic images using a structure-from-motion (SfM) pipeline, in which indigo carmine (IC) blue dye sprayed images were used to increase feature matches for SfM by enhancing stomach surface's textures. However, spraying the IC dye to the whole stomach requires additional time, labor, and cost, which is not desirable for patients and practitioners. In this paper, we propose an alternative way to achieve whole stomach 3D reconstruction without the need of the IC dye by generating virtual IC-sprayed (VIC) images based on image-to-image style translation trained on unpaired real no-IC and IC-sprayed images. We have specifically investigated the effect of input and output color channel selection for generating the VIC images and found that translating no-IC green-channel images to IC-sprayed red-channel images gives the best SfM reconstruction result.

* Accepted for main conference in EMBC 2020 
Access Paper or Ask Questions

Augmenting Colonoscopy using Extended and Directional CycleGAN for Lossy Image Translation

Mar 27, 2020
Shawn Mathew, Saad Nadeem, Sruti Kumari, Arie Kaufman

Colorectal cancer screening modalities, such as optical colonoscopy (OC) and virtual colonoscopy (VC), are critical for diagnosing and ultimately removing polyps (precursors of colon cancer). The non-invasive VC is normally used to inspect a 3D reconstructed colon (from CT scans) for polyps and if found, the OC procedure is performed to physically traverse the colon via endoscope and remove these polyps. In this paper, we present a deep learning framework, Extended and Directional CycleGAN, for lossy unpaired image-to-image translation between OC and VC to augment OC video sequences with scale-consistent depth information from VC, and augment VC with patient-specific textures, color and specular highlights from OC (e.g, for realistic polyp synthesis). Both OC and VC contain structural information, but it is obscured in OC by additional patient-specific texture and specular highlights, hence making the translation from OC to VC lossy. The existing CycleGAN approaches do not handle lossy transformations. To address this shortcoming, we introduce an extended cycle consistency loss, which compares the geometric structures from OC in the VC domain. This loss removes the need for the CycleGAN to embed OC information in the VC domain. To handle a stronger removal of the textures and lighting, a Directional Discriminator is introduced to differentiate the direction of translation (by creating paired information for the discriminator), as opposed to the standard CycleGAN which is direction-agnostic. Combining the extended cycle consistency loss and the Directional Discriminator, we show state-of-the-art results on scale-consistent depth inference for phantom, textured VC and for real polyp and normal colon video sequences. We also present results for realistic pendunculated and flat polyp synthesis from bumps introduced in 3D VC models.

* To appear in CVPR 2020. **First two authors contributed equally to this work 
Access Paper or Ask Questions

Unified cross-modality feature disentangler for unsupervised multi-domain MRI abdomen organs segmentation

Jul 19, 2020
Jue Jiang, Harini Veeraraghavan

Our contribution is a unified cross-modality feature disentagling approach for multi-domain image translation and multiple organ segmentation. Using CT as the labeled source domain, our approach learns to segment multi-modal (T1-weighted and T2-weighted) MRI having no labeled data. Our approach uses a variational auto-encoder (VAE) to disentangle the image content from style. The VAE constrains the style feature encoding to match a universal prior (Gaussian) that is assumed to span the styles of all the source and target modalities. The extracted image style is converted into a latent style scaling code, which modulates the generator to produce multi-modality images according to the target domain code from the image content features. Finally, we introduce a joint distribution matching discriminator that combines the translated images with task-relevant segmentation probability maps to further constrain and regularize image-to-image (I2I) translations. We performed extensive comparisons to multiple state-of-the-art I2I translation and segmentation methods. Our approach resulted in the lowest average multi-domain image reconstruction error of 1.34$\pm$0.04. Our approach produced an average Dice similarity coefficient (DSC) of 0.85 for T1w and 0.90 for T2w MRI for multi-organ segmentation, which was highly comparable to a fully supervised MRI multi-organ segmentation network (DSC of 0.86 for T1w and 0.90 for T2w MRI).

* MICCAI 2020 
* This paper has been accepted by MICCAI2020 
Access Paper or Ask Questions

HYLDA: End-to-end Hybrid Learning Domain Adaptation for LiDAR Semantic Segmentation

Jan 14, 2022
Eduardo R. Corral-Soto, Mrigank Rochan, Yannis Y. He, Shubhra Aich, Yang Liu, Liu Bingbing

In this paper we address the problem of training a LiDAR semantic segmentation network using a fully-labeled source dataset and a target dataset that only has a small number of labels. To this end, we develop a novel image-to-image translation engine, and couple it with a LiDAR semantic segmentation network, resulting in an integrated domain adaptation architecture we call HYLDA. To train the system end-to-end, we adopt a diverse set of learning paradigms, including 1) self-supervision on a simple auxiliary reconstruction task, 2) semi-supervised training using a few available labeled target domain frames, and 3) unsupervised training on the fake translated images generated by the image-to-image translation stage, together with the labeled frames from the source domain. In the latter case, the semantic segmentation network participates in the updating of the image-to-image translation engine. We demonstrate experimentally that HYLDA effectively addresses the challenging problem of improving generalization on validation data from the target domain when only a few target labeled frames are available for training. We perform an extensive evaluation where we compare HYLDA against strong baseline methods using two publicly available LiDAR semantic segmentation datasets.

Access Paper or Ask Questions

A Hand-Held Multimedia Translation and Interpretation System with Application to Diet Management

Jul 17, 2018
Albert Parra, Andrew W. Haddad, Mireille Boutin, Edward J. Delp

We propose a network independent, hand-held system to translate and disambiguate foreign restaurant menu items in real-time. The system is based on the use of a portable multimedia device, such as a smartphones or a PDA. An accurate and fast translation is obtained using a Machine Translation engine and a context-specific corpora to which we apply two pre-processing steps, called translation standardization and $n$-gram consolidation. The phrase-table generated is orders of magnitude lighter than the ones commonly used in market applications, thus making translations computationally less expensive, and decreasing the battery usage. Translation ambiguities are mitigated using multimedia information including images of dishes and ingredients, along with ingredient lists. We implemented a prototype of our system on an iPod Touch Second Generation for English speakers traveling in Spain. Our tests indicate that our translation method yields higher accuracy than translation engines such as Google Translate, and does so almost instantaneously. The memory requirements of the application, including the database of images, are also well within the limits of the device. By combining it with a database of nutritional information, our proposed system can be used to help individuals who follow a medical diet maintain this diet while traveling.

Access Paper or Ask Questions

Beyond Deterministic Translation for Unsupervised Domain Adaptation

Mar 11, 2022
Eleni Chiou, Eleftheria Panagiotaki, Iasonas Kokkinos

In this work we challenge the common approach of using a one-to-one mapping ('translation') between the source and target domains in unsupervised domain adaptation (UDA). Instead, we rely on stochastic translation to capture inherent translation ambiguities. This allows us to (i) train more accurate target networks by generating multiple outputs conditioned on the same source image, leveraging both accurate translation and data augmentation for appearance variability, (ii) impute robust pseudo-labels for the target data by averaging the predictions of a source network on multiple translated versions of a single target image and (iii) train and ensemble diverse networks in the target domain by modulating the degree of stochasticity in the translations. We report improvements over strong recent baselines, leading to state-of-the-art UDA results on two challenging semantic segmentation benchmarks.

Access Paper or Ask Questions

360-Degree Textures of People in Clothing from a Single Image

Aug 20, 2019
Verica Lazova, Eldar Insafutdinov, Gerard Pons-Moll

In this paper we predict a full 3D avatar of a person from a single image. We infer texture and geometry in the UV-space of the SMPL model using an image-to-image translation method. Given partial texture and segmentation layout maps derived from the input view, our model predicts the complete segmentation map, the complete texture map, and a displacement map. The predicted maps can be applied to the SMPL model in order to naturally generalize to novel poses, shapes, and even new clothing. In order to learn our model in a common UV-space, we non-rigidly register the SMPL model to thousands of 3D scans, effectively encoding textures and geometries as images in correspondence. This turns a difficult 3D inference task into a simpler image-to-image translation one. Results on rendered scans of people and images from the DeepFashion dataset demonstrate that our method can reconstruct plausible 3D avatars from a single image. We further use our model to digitally change pose, shape, swap garments between people and edit clothing. To encourage research in this direction we will make the source code available for research purpose.

Access Paper or Ask Questions

GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB

Dec 04, 2017
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, Christian Theobalt

We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training our CNN we propose a novel approach for the synthetic generation of training data that is based on a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to "real" images, such that the so-generated images follow the same statistical distribution as real-world hand images. For training this translation network we combine an adversarial loss and a cycle-consistency loss with a geometric consistency loss in order to preserve geometric properties (such as hand pose) during translation. We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage.

Access Paper or Ask Questions