Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhe Gan

Domain Adaptive Text Style Transfer

Aug 25, 2019
Dianqi Li, Yizhe Zhang, Zhe Gan, Yu Cheng, Chris Brockett, Ming-Ting Sun, Bill Dolan

Figure 1 for Domain Adaptive Text Style Transfer

Figure 2 for Domain Adaptive Text Style Transfer

Figure 3 for Domain Adaptive Text Style Transfer

Figure 4 for Domain Adaptive Text Style Transfer

Text style transfer without parallel data has achieved some practical success. However, in the scenario where less data is available, these methods may yield poor performance. In this paper, we examine domain adaptation for text style transfer to leverage massively available data from other domains. These data may demonstrate domain shift, which impedes the benefits of utilizing such data for training. To address this challenge, we propose simple yet effective domain adaptive text style transfer models, enabling domain-adaptive information exchange. The proposed models presumably learn from the source domain to: (i) distinguish stylized information and generic content information; (ii) maximally preserve content information; and (iii) adaptively transfer the styles in a domain-aware manner. We evaluate the proposed models on two style transfer tasks (sentiment and formality) over multiple target domains where only limited non-parallel data is available. Extensive experiments demonstrate the effectiveness of the proposed model compared to the baselines.

* EMNLP 2019, long paper

Via

Access Paper or Ask Questions

Patient Knowledge Distillation for BERT Model Compression

Aug 25, 2019
Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu

Figure 1 for Patient Knowledge Distillation for BERT Model Compression

Figure 2 for Patient Knowledge Distillation for BERT Model Compression

Figure 3 for Patient Knowledge Distillation for BERT Model Compression

Figure 4 for Patient Knowledge Distillation for BERT Model Compression

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks. However, the high demand for computing resources in training such models hinders their application in practice. In order to alleviate this resource hunger in large-scale model training, we propose a Patient Knowledge Distillation approach to compress an original large model (teacher) into an equally-effective lightweight shallow network (student). Different from previous knowledge distillation methods, which only use the output from the last layer of the teacher network for distillation, our student model patiently learns from multiple intermediate layers of the teacher model for incremental knowledge extraction, following two strategies: ($i$) PKD-Last: learning from the last $k$ layers; and ($ii$) PKD-Skip: learning from every $k$ layers. These two patient distillation schemes enable the exploitation of rich information in the teacher's hidden layers, and encourage the student model to patiently learn from and imitate the teacher through a multi-layer distillation process. Empirically, this translates into improved results on multiple NLP tasks with significant gain in training efficiency, without sacrificing model accuracy.

* Accepted to EMNLP 2019

Via

Access Paper or Ask Questions

Adversarial Domain Adaptation for Machine Reading Comprehension

Aug 24, 2019
Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

Figure 1 for Adversarial Domain Adaptation for Machine Reading Comprehension

Figure 2 for Adversarial Domain Adaptation for Machine Reading Comprehension

Figure 3 for Adversarial Domain Adaptation for Machine Reading Comprehension

Figure 4 for Adversarial Domain Adaptation for Machine Reading Comprehension

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain. To this end, we propose an Adversarial Domain Adaptation framework (AdaMRC), where ($i$) pseudo questions are first generated for unlabeled passages in the target domain, and then ($ii$) a domain classifier is incorporated into an MRC model to predict which domain a given passage-question pair comes from. The classifier and the passage-question encoder are jointly trained using adversarial learning to enforce domain-invariant representation learning. Comprehensive evaluations demonstrate that our approach ($i$) is generalizable to different MRC models and datasets, ($ii$) can be combined with pre-trained large-scale language models (such as ELMo and BERT), and ($iii$) can be extended to semi-supervised learning.

* Accepted to EMNLP 2019

Via

Access Paper or Ask Questions

Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Apr 02, 2019
Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa

Figure 1 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Figure 2 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Figure 3 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

Figure 4 for Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation

We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the Room-to-Room (R2R) Vision-and-Language navigation challenge of Anderson et. al. (2018). Given a natural language instruction and photo-realistic image views of a previously unseen environment, the agent was tasked with navigating from source to target location as quickly as possible. While all current approaches make local action decisions or score entire trajectories using beam search, ours balances local and global signals when exploring an unobserved environment. Importantly, this lets us act greedily but use global signals to backtrack when necessary. Applying FAST framework to existing state-of-the-art models achieved a 17% relative gain, an absolute 6% gain on Success rate weighted by Path Length (SPL).

* CVPR 2019 Oral, video demo: https://youtu.be/AD9TNohXoPA

Via

Access Paper or Ask Questions

Relation-aware Graph Attention Network for Visual Question Answering

Mar 29, 2019
Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu

Figure 1 for Relation-aware Graph Attention Network for Visual Question Answering

Figure 2 for Relation-aware Graph Attention Network for Visual Question Answering

Figure 3 for Relation-aware Graph Attention Network for Visual Question Answering

Figure 4 for Relation-aware Graph Attention Network for Visual Question Answering

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects. We propose a Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations. Two types of visual object relations are explored: (i) Explicit Relations that represent geometric positions and semantic interactions between objects; and (ii) Implicit Relations that capture the hidden dynamics between image regions. Experiments demonstrate that ReGAT outperforms prior state-of-the-art approaches on both VQA 2.0 and VQA-CP v2 datasets. We further show that ReGAT is compatible to existing VQA architectures, and can be used as a generic relation encoder to boost the model performance for VQA.

Via

Access Paper or Ask Questions

Topic-Guided Variational Autoencoders for Text Generation

Mar 17, 2019
Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, Lawrence Carin

Figure 1 for Topic-Guided Variational Autoencoders for Text Generation

Figure 2 for Topic-Guided Variational Autoencoders for Text Generation

Figure 3 for Topic-Guided Variational Autoencoders for Text Generation

Figure 4 for Topic-Guided Variational Autoencoders for Text Generation

We propose a topic-guided variational autoencoder (TGVAE) model for text generation. Distinct from existing variational autoencoder (VAE) based approaches, which assume a simple Gaussian prior for the latent code, our model specifies the prior as a Gaussian mixture model (GMM) parametrized by a neural topic module. Each mixture component corresponds to a latent topic, which provides guidance to generate sentences under the topic. The neural topic module and the VAE-based neural sequence module in our model are learned jointly. In particular, a sequence of invertible Householder transformations is applied to endow the approximate posterior of the latent code with high flexibility during model inference. Experimental results show that our TGVAE outperforms alternative approaches on both unconditional and conditional text generation, which can generate semantically-meaningful sentences with various topics.

Via

Access Paper or Ask Questions

Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

Feb 01, 2019
Zhe Gan, Yu Cheng, Ahmed EI Kholy, Linjie Li, Jingjing Liu, Jianfeng Gao

Figure 1 for Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

Figure 2 for Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

Figure 3 for Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

Figure 4 for Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog

This paper presents Recurrent Dual Attention Network (ReDAN) for visual dialog, using multi-step reasoning to answer a series of questions about an image. In each turn of the dialog, ReDAN infers answers progressively through multiple steps. In each step, a recurrently-updated semantic representation of the (refined) query is used for iterative reasoning over both the image and previous dialog history. Experimental results on VisDial v1.0 dataset show that the proposed ReDAN model outperforms prior state-of-the-art approaches across multiple evaluation metrics. Visualization on the iterative reasoning process further demonstrates that ReDAN can locate context-relevant visual and textual clues leading to the correct answers step-by-step.

Via

Access Paper or Ask Questions

Improving Sequence-to-Sequence Learning via Optimal Transport

Jan 18, 2019
Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin

Figure 1 for Improving Sequence-to-Sequence Learning via Optimal Transport

Figure 2 for Improving Sequence-to-Sequence Learning via Optimal Transport

Figure 3 for Improving Sequence-to-Sequence Learning via Optimal Transport

Figure 4 for Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This procedure focuses on modeling local syntactic patterns, and may fail to capture long-range semantic structure. We present a novel solution to alleviate these issues. Our approach imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features. We further show that this method can be understood as a Wasserstein gradient flow trying to match our model to the ground truth sequence distribution. Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning.

Via

Access Paper or Ask Questions

Sequential Attention GAN for Interactive Image Editing via Dialogue

Dec 20, 2018
Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, Jianfeng Gao

Figure 1 for Sequential Attention GAN for Interactive Image Editing via Dialogue

Figure 2 for Sequential Attention GAN for Interactive Image Editing via Dialogue

Figure 3 for Sequential Attention GAN for Interactive Image Editing via Dialogue

Figure 4 for Sequential Attention GAN for Interactive Image Editing via Dialogue

In this paper, we introduce a new task - interactive image editing via conversational language, where users can guide an agent to edit images via multi-turn dialogue in natural language. In each dialogue turn, the agent takes a source image and a natural language description from the user as the input, and generates a target image following the textual description. Two new datasets are created for this task,Zap-Seq and DeepFashion-Seq, collected via crowdsourcing. For this task, we propose a new Sequential Attention Genrative Adversarial Network (SeqAttnGAN) framework, which applies a neural state tracker to encode both source image and textual descriptions, and generates high quality images in each dialogue turn. To achieve better region specific text-to-image generation, we also introducean attention mechanism into the model. Experiments on the two datasets, including quantitative evaluation and user study, show that our model outperforms state-of-the-art ap-proaches in both image quality and text-to-image consistency.

Via

Access Paper or Ask Questions

StoryGAN: A Sequential Conditional GAN for Story Visualization

Dec 06, 2018
Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao

Figure 1 for StoryGAN: A Sequential Conditional GAN for Story Visualization

Figure 2 for StoryGAN: A Sequential Conditional GAN for Story Visualization

Figure 3 for StoryGAN: A Sequential Conditional GAN for Story Visualization

Figure 4 for StoryGAN: A Sequential Conditional GAN for Story Visualization

In this work we propose a new task called Story Visualization. Given a multi-sentence paragraph, the story is visualized by generating a sequence of images, one for each sentence. In contrast to video generation, story visualization focuses less on the continuity in generated images (frames), but more on the global consistency across dynamic scenes and characters -- a challenge that has not been addressed by any single-image or video generation methods. Therefore, we propose a new story-to-image-sequence generation model, StoryGAN, based on the sequential conditional GAN framework. Our model is unique in that it consists of a deep Context Encoder that dynamically tracks the story flow, and two discriminators at the story and image levels, respectively, to enhance the image quality and the consistency of the generated sequences. To evaluate the model, we modified existing datasets to create the CLEVR-SV and Pororo-SV datasets. Empirically, StoryGAN outperformed state-of-the-art models in image quality, contextual consistency metrics, and human evaluation.

Via

Access Paper or Ask Questions