Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming Zhou

Department of Pathology, UT Southwestern Medical Center, Dallas, TX, USA

Regularizing Neural Machine Translation by Target-bidirectional Agreement

Aug 13, 2018

Zhirui Zhang, Shuangzhi Wu, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen

Figure 1 for Regularizing Neural Machine Translation by Target-bidirectional Agreement

Figure 2 for Regularizing Neural Machine Translation by Target-bidirectional Agreement

Figure 3 for Regularizing Neural Machine Translation by Target-bidirectional Agreement

Figure 4 for Regularizing Neural Machine Translation by Target-bidirectional Agreement

Abstract:Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation. To address this issue, we propose a novel model regularization method for NMT training, which aims to improve the agreement between translations generated by left-to-right (L2R) and right-to-left (R2L) NMT decoders. This goal is achieved by introducing two Kullback-Leibler divergence regularization terms into the NMT training objective to reduce the mismatch between output probabilities of L2R and R2L models. In addition, we also employ a joint training strategy to allow L2R and R2L models to improve each other in an interactive update process. Experimental results show that our proposed method significantly outperforms state-of-the-art baselines on Chinese-English and English-German translation tasks.

Via

Access Paper or Ask Questions

Response Generation by Context-aware Prototype Editing

Jul 27, 2018

Yu Wu, Furu Wei, Shaohan Huang, Zhoujun Li, Ming Zhou

Figure 1 for Response Generation by Context-aware Prototype Editing

Figure 2 for Response Generation by Context-aware Prototype Editing

Figure 3 for Response Generation by Context-aware Prototype Editing

Figure 4 for Response Generation by Context-aware Prototype Editing

Abstract:Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses. We propose a new paradigm for response generation, that is response generation by editing, which significantly increases the diversity and informativeness of the generation results. Our assumption is that a plausible response can be generated by slightly revising an existing response prototype. The prototype is retrieved from a pre-defined index and provides a good start-point for generation because it is grammatical and informative. We design a response editing model, where an edit vector is formed by considering differences between a prototype context and a current context, and then the edit vector is fed to a decoder to revise the prototype response for the current context. Experiment results on a large scale dataset demonstrate that the response editing model outperforms generative and retrieval-based models on various aspects.

Via

Access Paper or Ask Questions

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Jul 20, 2018

Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang

Figure 1 for R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Figure 2 for R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Figure 3 for R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Figure 4 for R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Abstract:Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. However, these candidate entities or attributes might be unrelated to the VQA task and have limited semantic capacities. To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. Specifically, we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset via a semantic similarity module, in which each data consists of an image, a corresponding question, a correct answer and a supporting relation fact. A well-defined relation detector is then adopted to predict visual question-related relation facts. We further propose a multi-step attention model composed of visual attention and semantic attention sequentially to extract related visual knowledge and semantic knowledge. We conduct comprehensive experiments on the two benchmark datasets, demonstrating that our model achieves state-of-the-art performance and verifying the benefit of considering visual relation facts.

* 10 pages, 5 figures, accepted as an oral paper in SIGKDD 2018

Via

Access Paper or Ask Questions

Mean Field Multi-Agent Reinforcement Learning

Jul 19, 2018

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang

Figure 1 for Mean Field Multi-Agent Reinforcement Learning

Figure 2 for Mean Field Multi-Agent Reinforcement Learning

Figure 3 for Mean Field Multi-Agent Reinforcement Learning

Figure 4 for Mean Field Multi-Agent Reinforcement Learning

Abstract:Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent's optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods.

* ICML 2018 (Full paper + Long talk)

Via

Access Paper or Ask Questions

Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Jul 11, 2018

Tao Ge, Furu Wei, Ming Zhou

Figure 1 for Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Figure 2 for Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Figure 3 for Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Figure 4 for Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Abstract:Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC). Based on the seq2seq framework, we propose a novel fluency boost learning and inference mechanism. Fluency boosting learning generates diverse error-corrected sentence pairs during training, enabling the error correction model to learn how to improve a sentence's fluency from more instances, while fluency boosting inference allows the model to correct a sentence incrementally with multiple inference steps. Combining fluency boost learning and inference with convolutional seq2seq models, our approach achieves the state-of-the-art performance: 75.72 (F_{0.5}) on CoNLL-2014 10 annotation dataset and 62.42 (GLEU) on JFLEG test set respectively, becoming the first GEC system that reaches human-level performance (72.58 for CoNLL and 62.37 for JFLEG) on both of the benchmarks.

* Substantial text overlap with "Fluency Boost Learning and Inference for Neural Grammatical Error Correction" (accepted by ACL 2018)

Via

Access Paper or Ask Questions

Triangular Architecture for Rare Language Translation

Jul 11, 2018

Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma

Figure 1 for Triangular Architecture for Rare Language Translation

Figure 2 for Triangular Architecture for Rare Language Translation

Figure 3 for Triangular Architecture for Rare Language Translation

Figure 4 for Triangular Architecture for Rare Language Translation

Abstract:Neural Machine Translation (NMT) performs poor on the low-resource language pair $(X,Z)$, especially when $Z$ is a rare language. By introducing another rich language $Y$, we propose a novel triangular training architecture (TA-NMT) to leverage bilingual data $(Y,Z)$ (may be small) and $(X,Y)$ (can be rich) to improve the translation performance of low-resource pairs. In this triangular architecture, $Z$ is taken as the intermediate latent variable, and translation models of $Z$ are jointly optimized with a unified bidirectional EM algorithm under the goal of maximizing the translation likelihood of $(X,Y)$. Empirical results demonstrate that our method significantly improves the translation quality of rare languages on MultiUN and IWSLT2012 datasets, and achieves even better performance combining back-translation methods.

* Accepted to ACL 2018, 10 pages, 5 figures, 5 tables (with 5-5-5-5 high score)

Via

Access Paper or Ask Questions

Neural Document Summarization by Jointly Learning to Score and Select Sentences

Jul 06, 2018

Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao

Figure 1 for Neural Document Summarization by Jointly Learning to Score and Select Sentences

Figure 2 for Neural Document Summarization by Jointly Learning to Score and Select Sentences

Figure 3 for Neural Document Summarization by Jointly Learning to Score and Select Sentences

Figure 4 for Neural Document Summarization by Jointly Learning to Score and Select Sentences

Abstract:Sentence scoring and sentence selection are two main steps in extractive document summarization systems. However, previous works treat them as two separated subtasks. In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences. It first reads the document sentences with a hierarchical encoder to obtain the representation of sentences. Then it builds the output summary by extracting sentences one by one. Different from previous methods, our approach integrates the selection strategy into the scoring model, which directly predicts the relative importance given previously selected sentences. Experiments on the CNN/Daily Mail dataset show that the proposed framework significantly outperforms the state-of-the-art extractive summarization models.

* In ACL 2018

Via

Access Paper or Ask Questions

Sequential Copying Networks

Jul 06, 2018

Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou

Figure 1 for Sequential Copying Networks

Figure 2 for Sequential Copying Networks

Figure 3 for Sequential Copying Networks

Figure 4 for Sequential Copying Networks

Abstract:Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation. However, existing works on modeling copying or pointing mechanism only considers single word copying from the source sentences. In this paper, we propose a novel copying framework, named Sequential Copying Networks (SeqCopyNet), which not only learns to copy single words, but also copies sequences from the input sentence. It leverages the pointer networks to explicitly select a sub-span from the source side to target side, and integrates this sequential copying mechanism to the generation process in the encoder-decoder paradigm. Experiments on abstractive sentence summarization and question generation tasks show that the proposed SeqCopyNet can copy meaningful spans and outperforms the baseline models.

* In AAAI 2018

Via

Access Paper or Ask Questions

Achieving Human Parity on Automatic Chinese to English News Translation

Jun 29, 2018

Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li(+14 more)

Figure 1 for Achieving Human Parity on Automatic Chinese to English News Translation

Figure 2 for Achieving Human Parity on Automatic Chinese to English News Translation

Figure 3 for Achieving Human Parity on Automatic Chinese to English News Translation

Figure 4 for Achieving Human Parity on Automatic Chinese to English News Translation

Abstract:Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft's machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.

Via

Access Paper or Ask Questions

Dictionary-Guided Editing Networks for Paraphrase Generation

Jun 21, 2018

Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

Figure 1 for Dictionary-Guided Editing Networks for Paraphrase Generation

Figure 2 for Dictionary-Guided Editing Networks for Paraphrase Generation

Figure 3 for Dictionary-Guided Editing Networks for Paraphrase Generation

Figure 4 for Dictionary-Guided Editing Networks for Paraphrase Generation

Abstract:An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct. We propose a novel approach to modeling the process with dictionary-guided editing networks which effectively conduct rewriting on the source sentence to generate paraphrase sentences. It jointly learns the selection of the appropriate word level and phrase level paraphrase pairs in the context of the original sentence from an off-the-shelf dictionary as well as the generation of fluent natural language sentences. Specifically, the system retrieves a set of word level and phrase level araphrased pairs derived from the Paraphrase Database (PPDB) for the original sentence, which is used to guide the decision of which the words might be deleted or inserted with the soft attention mechanism under the sequence-to-sequence framework. We conduct experiments on two benchmark datasets for paraphrase generation, namely the MSCOCO and Quora dataset. The evaluation results demonstrate that our dictionary-guided editing networks outperforms the baseline methods.

Via

Access Paper or Ask Questions