We address the problem of un-supervised geometric image-to-image translation. Rather than transferring the style of an image as a whole, our goal is to translate the geometry of an object as depicted in different domains while preserving its appearance. Towards this goal, we propose a fully un-paired model that performs shape translation within a single model and without the need of additional post-processing stages. Extensive experiments on the VITON, CMU-Multi-PIE and our own FashionStyle datasets show the effectiveness of the proposed method at achieving the task at hand. In addition, we show that despite their low-dimensionality, the features learned by our model have potential for the item retrieval task
Unsupervised domain adaptation (UDA) aims at adapting the model trained on a labeled source-domain dataset to another target-domain dataset without any annotation. The task of UDA for the open-set person re-identification (re-ID) is even more challenging as the identities (classes) have no overlap between the two domains. Existing UDA methods for person re-ID have the following limitations. 1) Pseudo-label-based methods achieve state-of-the-art performances but ignore the complex relations between two domains' images, along with the valuable source-domain annotations. 2) Domain translation-based methods cannot achieve competitive performances as the domain translation is not properly regularized to generate informative enough training samples that well maintain inter-sample relations. To tackle the above challenges, we propose an end-to-end structured domain adaptation framework that consists of a novel structured domain-translation network and two domain-specific person image encoders. The structured domain-translation network can effectively transform the source-domain images into the target domain while well preserving the original intra- and inter-identity relations. The target-domain encoder could then be trained using both source-to-target translated images with valuable ground-truth labels and target-domain images with pseudo labels. Importantly, the domain-translation network and target-domain encoder are jointly optimized, improving each other towards the overall objective, i.e. to achieve optimal re-ID performances on the target domain. Our proposed framework outperforms state-of-the-art methods on multiple UDA tasks of person re-ID.
Braille is the only system to visually impaired people for reading and writing. However, general people cannot read Braille. So, teachers and relatives find it hard to assist them with learning. Almost every major language has software solutions for this translation purpose. However, in Bengali there is an absence of this useful tool. Here, we propose Braille to Text Translator, which takes image of these tactile alphabets, and translates them to plain text. Image deterioration, scan-time page rotation, and braille dot deformation are the principal issues in this scheme. All of these challenges are directly checked using special image processing and geometric structure analysis. The technique yields 97.25% accuracy in recognizing Braille characters.
Instance segmentation for unlabeled imaging modalities is a challenging but essential task as collecting expert annotation can be expensive and time-consuming. Existing works segment a new modality by either deploying a pre-trained model optimized on diverse training data or conducting domain translation and image segmentation as two independent steps. In this work, we propose a novel Cyclic Segmentation Generative Adversarial Network (CySGAN) that conducts image translation and instance segmentation jointly using a unified framework. Besides the CycleGAN losses for image translation and supervised losses for the annotated source domain, we introduce additional self-supervised and segmentation-based adversarial objectives to improve the model performance by leveraging unlabeled target domain images. We benchmark our approach on the task of 3D neuronal nuclei segmentation with annotated electron microscopy (EM) images and unlabeled expansion microscopy (ExM) data. Our CySGAN outperforms both pretrained generalist models and the baselines that sequentially conduct image translation and segmentation. Our implementation and the newly collected, densely annotated ExM nuclei dataset, named NucExM, are available at https://connectomics-bazaar.github.io/proj/CySGAN/index.html.
Current deep learning approaches for diffusion MRI modeling circumvent the need for densely-sampled diffusion-weighted images (DWIs) by directly predicting microstructural indices from sparsely-sampled DWIs. However, they implicitly make unrealistic assumptions of static $q$-space sampling during training and reconstruction. Further, such approaches can restrict downstream usage of variably sampled DWIs for usages including the estimation of microstructural indices or tractography. We propose a generative adversarial translation framework for high-quality DWI synthesis with arbitrary $q$-space sampling given commonly acquired structural images (e.g., B0, T1, T2). Our translation network linearly modulates its internal representations conditioned on continuous $q$-space information, thus removing the need for fixed sampling schemes. Moreover, this approach enables downstream estimation of high-quality microstructural maps from arbitrarily subsampled DWIs, which may be particularly important in cases with sparsely sampled DWIs. Across several recent methodologies, the proposed approach yields improved DWI synthesis accuracy and fidelity with enhanced downstream utility as quantified by the accuracy of scalar microstructure indices estimated from the synthesized images. Code is available at https://github.com/mengweiren/q-space-conditioned-dwi-synthesis.
Despite the significant advances in recent years, Generative Adversarial Networks (GANs) are still notoriously hard to train. In this paper, we propose three novel curriculum learning strategies for training GANs. All strategies are first based on ranking the training images by their difficulty scores, which are estimated by a state-of-the-art image difficulty predictor. Our first strategy is to divide images into gradually more difficult batches. Our second strategy introduces a novel curriculum loss function for the discriminator that takes into account the difficulty scores of the real images. Our third strategy is based on sampling from an evolving distribution, which favors the easier images during the initial training stages and gradually converges to a uniform distribution, in which samples are equally likely, regardless of difficulty. We compare our curriculum learning strategies with the classic training procedure on two tasks: image generation and image translation. Our experiments indicate that all strategies provide faster convergence and superior results. For example, our best curriculum learning strategy applied on spectrally normalized GANs (SNGANs) fooled human annotators in thinking that generated CIFAR-like images are real in 25.0% of the presented cases, while the SNGANs trained using the classic procedure fooled the annotators in only 18.4% cases. Similarly, in image translation, the human annotators preferred the images produced by the Cycle-consistent GAN (CycleGAN) trained using curriculum learning in 40.5% cases and those produced by CycleGAN based on classic training in only 19.8% cases, 39.7% cases being labeled as ties.
We cast the problem of image denoising as a domain translation problem between high and low noise domains. By modifying the cycleGAN model, we are able to learn a mapping between these domains on unpaired retinal optical coherence tomography images. In quantitative measurements and a qualitative evaluation by ophthalmologists, we show how this approach outperforms other established methods. The results indicate that the network differentiates subtle changes in the level of noise in the image. Further investigation of the model's feature maps reveals that it has learned to distinguish retinal layers and other distinct regions of the images.
This paper proposes the first non-flow-based deep framework for high dynamic range (HDR) imaging of dynamic scenes with large-scale foreground motions. In state-of-the-art deep HDR imaging, input images are first aligned using optical flows before merging, which are still error-prone due to occlusion and large motions. In stark contrast to flow-based methods, we formulate HDR imaging as an image translation problem without optical flows. Moreover, our simple translation network can automatically hallucinate plausible HDR details in the presence of total occlusion, saturation and under-exposure, which are otherwise almost impossible to recover by conventional optimization approaches. Our framework can also be extended for different reference images. We performed extensive qualitative and quantitative comparisons to show that our approach produces excellent results where color artifacts and geometric distortions are significantly reduced compared to existing state-of-the-art methods, and is robust across various inputs, including images without radiometric calibration.
We propose a universal image reconstruction method to represent detailed images purely from binary sparse edge and flat color domain. Inspired by the procedures of painting, our framework, based on generative adversarial network, consists of three phases: Imitation Phase aims at initializing networks, followed by Generating Phase to reconstruct preliminary images. Moreover, Refinement Phase is utilized to fine-tune preliminary images into final outputs with details. This framework allows our model generating abundant high frequency details from sparse input information. We also explore the defects of disentangling style latent space implicitly from images, and demonstrate that explicit color domain in our model performs better on controllability and interpretability. In our experiments, we achieve outstanding results on reconstructing realistic images and translating hand drawn drafts into satisfactory paintings. Besides, within the domain of edge-to-image translation, our model PI-REC outperforms existing state-of-the-art methods on evaluations of realism and accuracy, both quantitatively and qualitatively.