Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Image To Image Translation": models, code, and papers

Exploiting BERT For Multimodal Target SentimentClassification Through Input Space Translation

Aug 03, 2021
Zaid Khan, Yun Fu

Multimodal target/aspect sentiment classification combines multimodal sentiment analysis and aspect/target sentiment classification. The goal of the task is to combine vision and language to understand the sentiment towards a target entity in a sentence. Twitter is an ideal setting for the task because it is inherently multimodal, highly emotional, and affects real world events. However, multimodal tweets are short and accompanied by complex, possibly irrelevant images. We introduce a two-stream model that translates images in input space using an object-aware transformer followed by a single-pass non-autoregressive text generation approach. We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model. Our approach increases the amount of text available to the language model and distills the object-level information in complex images. We achieve state-of-the-art performance on two multimodal Twitter datasets without modifying the internals of the language model to accept multimodal data, demonstrating the effectiveness of our translation. In addition, we explain a failure mode of a popular approach for aspect sentiment analysis when applied to tweets. Our code is available at \textcolor{blue}{\url{https://github.com/codezakh/exploiting-BERT-thru-translation}}.

* ACM Multimedia 2021 
  
Access Paper or Ask Questions

Two-phase Hair Image Synthesis by Self-Enhancing Generative Model

Feb 28, 2019
Haonan Qiu, Chuan Wang, Hang Zhu, Xiangyu Zhu, Jinjin Gu, Xiaoguang Han

Generating plausible hair image given limited guidance, such as sparse sketches or low-resolution image, has been made possible with the rise of Generative Adversarial Networks (GANs). Traditional image-to-image translation networks can generate recognizable results, but finer textures are usually lost and blur artifacts commonly exist. In this paper, we propose a two-phase generative model for high-quality hair image synthesis. The two-phase pipeline first generates a coarse image by an existing image translation model, then applies a re-generating network with self-enhancing capability to the coarse image. The self-enhancing capability is achieved by a proposed structure extraction layer, which extracts the texture and orientation map from a hair image. Extensive experiments on two tasks, Sketch2Hair and Hair Super-Resolution, demonstrate that our approach is able to synthesize plausible hair image with finer details, and outperforms the state-of-the-art.

  
Access Paper or Ask Questions

Estimation of Tissue Oxygen Saturation from RGB Images based on Pixel-level Image Translation

Apr 19, 2018
Qing-Biao Li, Xiao-Yun Zhou, Jianyu Lin, Jian-Qing Zheng, Neil T. Clancy, Daniel S. Elson

Intra-operative measurement of tissue oxygen saturation (StO2) has been widely explored by pulse oximetry or hyperspectral imaging (HSI) to assess the function and viability of tissue. In this paper we propose a pixel- level image-to-image translation approach based on conditional Generative Adversarial Networks (cGAN) to estimate tissue oxygen saturation (StO2) directly from RGB images. The real-time performance and non-reliance on additional hardware, enable a seamless integration of the proposed method into surgical and diagnostic workflows with standard endoscope systems. For validation, RGB images and StO2 ground truth were simulated and estimated from HSI images collected by a liquid crystal tuneable filter (LCTF) endoscope for three tissue types (porcine bowel, lamb uterus and rabbit uterus). The result show that the proposed method can achieve visually identical images with comparable accuracy.

  
Access Paper or Ask Questions

ReMix: Towards Image-to-Image Translation with Limited Data

Mar 31, 2021
Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun

Image-to-image (I2I) translation methods based on generative adversarial networks (GANs) typically suffer from overfitting when limited training data is available. In this work, we propose a data augmentation method (ReMix) to tackle this issue. We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples. The generator learns to translate the in-between samples rather than memorizing the training set, and thereby forces the discriminator to generalize. The proposed approach effectively reduces the ambiguity of generation and renders content-preserving results. The ReMix method can be easily incorporated into existing GAN models with minor modifications. Experimental results on numerous tasks demonstrate that GAN models equipped with the ReMix method achieve significant improvements.

* CVPR 2021 
  
Access Paper or Ask Questions

Style-Restricted GAN: Multi-Modal Translation with Style Restriction Using Generative Adversarial Networks

May 17, 2021
Sho Inoue, Tad Gonsalves

Unpaired image-to-image translation using Generative Adversarial Networks (GAN) is successful in converting images among multiple domains. Moreover, recent studies have shown a way to diversify the outputs of the generator. However, since there are no restrictions on how the generator diversifies the results, it is likely to translate some unexpected features. In this paper, we propose Style-Restricted GAN (SRGAN), a novel approach to transfer input images into different domains' with different styles, changing the exclusively class-related features. Additionally, instead of KL divergence loss, we adopt 3 new losses to restrict the distribution of the encoded features: batch KL divergence loss, correlation loss, and histogram imitation loss. The study reports quantitative as well as qualitative results with Precision, Recall, Density, and Coverage. The proposed 3 losses lead to the enhancement of the level of diversity compared to the conventional KL loss. In particular, SRGAN is found to be successful in translating with higher diversity and without changing the class-unrelated features in the CelebA face dataset. Our implementation is available at https://github.com/shinshoji01/Style-Restricted_GAN.

* 18 pages, 13 figures, 6 tables; This paper is submitted to IEEE Access; Our implementation is available at https://github.com/shinshoji01/Style-Restricted_GAN 
  
Access Paper or Ask Questions

Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation

Aug 05, 2021
Zaid Khan, Yun Fu

Multimodal target/aspect sentiment classification combines multimodal sentiment analysis and aspect/target sentiment classification. The goal of the task is to combine vision and language to understand the sentiment towards a target entity in a sentence. Twitter is an ideal setting for the task because it is inherently multimodal, highly emotional, and affects real world events. However, multimodal tweets are short and accompanied by complex, possibly irrelevant images. We introduce a two-stream model that translates images in input space using an object-aware transformer followed by a single-pass non-autoregressive text generation approach. We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model. Our approach increases the amount of text available to the language model and distills the object-level information in complex images. We achieve state-of-the-art performance on two multimodal Twitter datasets without modifying the internals of the language model to accept multimodal data, demonstrating the effectiveness of our translation. In addition, we explain a failure mode of a popular approach for aspect sentiment analysis when applied to tweets. Our code is available at \textcolor{blue}{\url{https://github.com/codezakh/exploiting-BERT-thru-translation}}.

* ACM Multimedia 2021 Oral 
  
Access Paper or Ask Questions

A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences

Oct 17, 2020
Andrew Merritt, Chenhui Chu, Yuki Arase

Multimodal neural machine translation (NMT) has become an increasingly important area of research over the years because additional modalities, such as image data, can provide more context to textual data. Furthermore, the viability of training multimodal NMT models without a large parallel corpus continues to be investigated due to low availability of parallel sentences with images, particularly for English-Japanese data. However, this void can be filled with comparable sentences that contain bilingual terms and parallel phrases, which are naturally created through media such as social network posts and e-commerce product descriptions. In this paper, we propose a new multimodal English-Japanese corpus with comparable sentences that are compiled from existing image captioning datasets. In addition, we supplement our comparable sentences with a smaller parallel corpus for validation and test purposes. To test the performance of this comparable sentence translation scenario, we train several baseline NMT models with our comparable corpus and evaluate their English-Japanese translation performance. Due to low translation scores in our baseline experiments, we believe that current multimodal NMT models are not designed to effectively utilize comparable sentence data. Despite this, we hope for our corpus to be used to further research into multimodal NMT with comparable sentences.

  
Access Paper or Ask Questions

Visualizing Adapted Knowledge in Domain Transfer

May 01, 2021
Yunzhong Hou, Liang Zheng

A source model trained on source data and a target model learned through unsupervised domain adaptation (UDA) usually encode different knowledge. To understand the adaptation process, we portray their knowledge difference with image translation. Specifically, we feed a translated image and its original version to the two models respectively, formulating two branches. Through updating the translated image, we force similar outputs from the two branches. When such requirements are met, differences between the two images can compensate for and hence represent the knowledge difference between models. To enforce similar outputs from the two branches and depict the adapted knowledge, we propose a source-free image translation method that generates source-style images using only target images and the two models. We visualize the adapted knowledge on several datasets with different UDA methods and find that generated images successfully capture the style difference between the two domains. For application, we show that generated images enable further tuning of the target model without accessing source data. Code available at https://github.com/hou-yz/DA_visualization.

* CVPR 2021 
  
Access Paper or Ask Questions

A Novel Histogram Based Robust Image Registration Technique

Feb 23, 2014
V. Karthikeyan

In this paper, a method for Automatic Image Registration (AIR) through histogram is proposed. Automatic image registration is one of the crucial steps in the analysis of remotely sensed data. A new acquired image must be transformed, using image registration techniques, to match the orientation and scale of previous related images. This new approach combines several segmentations of the pair of images to be registered. A relaxation parameter on the histogram modes delineation is introduced. It is followed by characterization of the extracted objects through the objects area, axis ratio, and perimeter and fractal dimension. The matched objects are used for rotation and translation estimation. It allows for the registration of pairs of images with differences in rotation and translation. This method contributes to subpixel accuracy.

* 5 pages and 6 figures. submit/0850305 
  
Access Paper or Ask Questions
<<
39
40
41
42
43
44
45
46
47
48
49
50
>>