Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhilin Yang

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Sep 25, 2018

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning

Figure 1 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Figure 2 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Figure 3 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Figure 4 for HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Abstract:Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

* EMNLP 2018 long paper. The first three authors contribute equally. Data, code, and blog posts available at https://hotpotqa.github.io/

Via

Access Paper or Ask Questions

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Sep 11, 2018

Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell

Figure 1 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Figure 2 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Figure 3 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Figure 4 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Abstract:For languages with no annotated resources, unsupervised transfer of natural language processing models such as named-entity recognition (NER) from resource-rich languages would be an appealing capability. However, differences in words and word order across languages make it a challenging problem. To improve mapping of lexical items across languages, we propose a method that finds translations based on bilingual word embeddings. To improve robustness to word order differences, we propose to use self-attention, which allows for a degree of flexibility with respect to word order. We demonstrate that these methods achieve state-of-the-art or competitive NER performance on commonly tested languages under a cross-lingual setting, with much lower resource requirements than past approaches. We also evaluate the challenges of applying these methods to Uyghur, a low-resource language.

* EMNLP 2018 long paper

Via

Access Paper or Ask Questions

GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

Jul 02, 2018

Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun

Figure 1 for GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

Figure 2 for GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

Figure 3 for GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

Figure 4 for GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

Abstract:Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels.

Via

Access Paper or Ask Questions

Neural Models for Reasoning over Multiple Mentions using Coreference

Apr 16, 2018

Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov

Figure 1 for Neural Models for Reasoning over Multiple Mentions using Coreference

Figure 2 for Neural Models for Reasoning over Multiple Mentions using Coreference

Figure 3 for Neural Models for Reasoning over Multiple Mentions using Coreference

Figure 4 for Neural Models for Reasoning over Multiple Mentions using Coreference

Abstract:Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets -- Wikihop, LAMBADA and the bAbi AI tasks -- with large gains when training data is scarce.

* NAACL 2018 (Short Paper)

Via

Access Paper or Ask Questions

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Apr 16, 2018

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Figure 1 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 2 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 3 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Figure 4 for Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Abstract:Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better agents in the short term, and collaborate by sharing their agents' skills in the long term. This results in a gamified, engaging experience for the Turkers and a better quality teaching signal for the agents compared to static datasets, as the Turkers naturally adapt the training data to the agent's abilities.

Via

Access Paper or Ask Questions

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Mar 02, 2018

Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen

Figure 1 for Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Figure 2 for Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Figure 3 for Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Figure 4 for Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Abstract:We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

* ICLR Oral 2018

Via

Access Paper or Ask Questions

Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Nov 27, 2017

Fan Yang, Zhilin Yang, William W. Cohen

Figure 1 for Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Figure 2 for Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Figure 3 for Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Figure 4 for Differentiable Learning of Logical Rules for Knowledge Base Reasoning

Abstract:We study the problem of learning probabilistic first-order logical rules for knowledge base reasoning. This learning problem is difficult because it requires learning the parameters in a continuous space as well as the structure in a discrete space. We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model. This approach is inspired by a recently-developed differentiable logic called TensorLog, where inference tasks can be compiled into sequences of differentiable operations. We design a neural controller system that learns to compose these operations. Empirically, our method outperforms prior work on multiple knowledge base benchmark datasets, including Freebase and WikiMovies.

* Accepted at NIPS 2017

Via

Access Paper or Ask Questions

Good Semi-supervised Learning that Requires a Bad GAN

Nov 03, 2017

Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov

Figure 1 for Good Semi-supervised Learning that Requires a Bad GAN

Figure 2 for Good Semi-supervised Learning that Requires a Bad GAN

Figure 3 for Good Semi-supervised Learning that Requires a Bad GAN

Figure 4 for Good Semi-supervised Learning that Requires a Bad GAN

Abstract:Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets.

* NIPS 2017 camera ready

Via

Access Paper or Ask Questions

Words or Characters? Fine-grained Gating for Reading Comprehension

Sep 11, 2017

Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov

Figure 1 for Words or Characters? Fine-grained Gating for Reading Comprehension

Figure 2 for Words or Characters? Fine-grained Gating for Reading Comprehension

Figure 3 for Words or Characters? Fine-grained Gating for Reading Comprehension

Figure 4 for Words or Characters? Fine-grained Gating for Reading Comprehension

Abstract:Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension. We present a fine-grained gating mechanism to dynamically combine word-level and character-level representations based on properties of the words. We also extend the idea of fine-grained gating to modeling the interaction between questions and paragraphs for reading comprehension. Experiments show that our approach can improve the performance on reading comprehension tasks, achieving new state-of-the-art results on the Children's Book Test dataset. To demonstrate the generality of our gating mechanism, we also show improved results on a social media tag prediction task.

* Accepted as a conference paper at ICLR 2017

Via

Access Paper or Ask Questions

Semi-Supervised QA with Generative Domain-Adaptive Nets

Apr 22, 2017

Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, William W. Cohen

Figure 1 for Semi-Supervised QA with Generative Domain-Adaptive Nets

Figure 2 for Semi-Supervised QA with Generative Domain-Adaptive Nets

Figure 3 for Semi-Supervised QA with Generative Domain-Adaptive Nets

Abstract:We study the problem of semi-supervised question answering----utilizing unlabeled text to boost the performance of question answering models. We propose a novel training framework, the Generative Domain-Adaptive Nets. In this framework, we train a generative model to generate questions based on the unlabeled text, and combine model-generated questions with human-generated questions for training question answering models. We develop novel domain adaptation algorithms, based on reinforcement learning, to alleviate the discrepancy between the model-generated data distribution and the human-generated data distribution. Experiments show that our proposed framework obtains substantial improvement from unlabeled text.

* Accepted as a long paper at ACL2017

Via

Access Paper or Ask Questions