Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kenton Lee

Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

May 05, 2020

Hao Cheng, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Figure 1 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Figure 2 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Figure 3 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Figure 4 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Abstract:We address the problem of extractive question answering using document-level distant super-vision, pairing questions and relevant documents with answer strings. We compare previously used probability space and distant super-vision assumptions (assumptions on the correspondence between the weak answer string labels and possible answer mention spans). We show that these assumptions interact, and that different configurations provide complementary benefits. We demonstrate that a multi-objective model can efficiently combine the advantages of multiple assumptions and out-perform the best individual formulation. Our approach outperforms previous state-of-the-art models by 4.3 points in F1 on TriviaQA-Wiki and 1.7 points in Rouge-L on NarrativeQA summaries.

* ACL2020

Via

Access Paper or Ask Questions

Contextualized Representations Using Textual Encyclopedic Knowledge

Apr 24, 2020

Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova

Figure 1 for Contextualized Representations Using Textual Encyclopedic Knowledge

Figure 2 for Contextualized Representations Using Textual Encyclopedic Knowledge

Figure 3 for Contextualized Representations Using Textual Encyclopedic Knowledge

Figure 4 for Contextualized Representations Using Textual Encyclopedic Knowledge

Abstract:We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders. Moreover, knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. On TriviaQA, our approach obtains improvements of 1.6 to 3.1 F1 over comparable RoBERTa models which do not integrate background knowledge dynamically. On MRQA, a large collection of diverse QA datasets, we see consistent gains in-domain along with large improvements out-of-domain on BioASQ (2.1 to 4.2 F1), TextbookQA (1.6 to 2.0 F1), and DuoRC (1.1 to 2.0 F1).

Via

Access Paper or Ask Questions

REALM: Retrieval-Augmented Language Model Pre-Training

Feb 10, 2020

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang

Figure 1 for REALM: Retrieval-Augmented Language Model Pre-Training

Figure 2 for REALM: Retrieval-Augmented Language Model Pre-Training

Figure 3 for REALM: Retrieval-Augmented Language Model Pre-Training

Figure 4 for REALM: Retrieval-Augmented Language Model Pre-Training

Abstract:Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

Via

Access Paper or Ask Questions

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Sep 25, 2019

Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Figure 1 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Figure 2 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Figure 3 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Figure 4 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Abstract:Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to down-stream tasks, several model compression techniques on pre-trained language representations have been proposed (Sun et al., 2019; Sanh, 2019). However, surprisingly, the simple baseline of just pre-training and fine-tuning compact models has been overlooked. In this paper, we first show that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work. Starting with pre-trained compact models, we then explore transferring task knowledge from large fine-tuned models through standard knowledge distillation. The resulting simple, yet effective and general algorithm, Pre-trained Distillation, brings further improvements. Through extensive experiments, we more generally explore the interaction between pre-training and distillation under two variables that have been under-studied: model size and properties of unlabeled task data. One surprising observation is that they have a compound effect even when sequentially applied on the same data. To accelerate future research, we will make our 24 pre-trained miniature BERT models publicly available.

* Added comparison to concurrent work

Via

Access Paper or Ask Questions

Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Sep 12, 2019

Daniel Andor, Luheng He, Kenton Lee, Emily Pitler

Figure 1 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Figure 2 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Figure 3 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Figure 4 for Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension

Abstract:Reading comprehension models have been successfully applied to extractive text answers, but it is unclear how best to generalize these models to abstractive numerical answers. We enable a BERT-based reading comprehension model to perform lightweight numerical reasoning. We augment the model with a predefined set of executable 'programs' which encompass simple arithmetic as well as extraction. Rather than having to learn to manipulate numbers directly, the model can pick a program and execute it. On the recent Discrete Reasoning Over Passages (DROP) dataset, designed to challenge reading comprehension models, we show a 33% absolute improvement by adding shallow programs. The model can learn to predict new operations when appropriate in a math word problem setting (Roy and Roth, 2015) with very few training examples.

Via

Access Paper or Ask Questions

Zero-Shot Entity Linking by Reading Entity Descriptions

Jun 18, 2019

Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee

Figure 1 for Zero-Shot Entity Linking by Reading Entity Descriptions

Figure 2 for Zero-Shot Entity Linking by Reading Entity Descriptions

Figure 3 for Zero-Shot Entity Linking by Reading Entity Descriptions

Figure 4 for Zero-Shot Entity Linking by Reading Entity Descriptions

Abstract:We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data. The goal is to enable robust transfer to highly specialized domains, and so no metadata or alias tables are assumed. In this setting, entities are only identified by text descriptions, and models must rely strictly on language understanding to resolve the new entities. First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities. Second, we propose a simple and effective adaptive pre-training strategy, which we term domain-adaptive pre-training (DAP), to address the domain shift problem associated with linking unseen entities in a new domain. We present experiments on a new dataset that we construct for this task and show that DAP improves over strong pre-training baselines, including BERT. The data and code are available at https://github.com/lajanugen/zeshel.

* ACL 2019

Via

Access Paper or Ask Questions

Latent Retrieval for Weakly Supervised Open Domain Question Answering

Jun 06, 2019

Kenton Lee, Ming-Wei Chang, Kristina Toutanova

Figure 1 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Figure 2 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Figure 3 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Figure 4 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Abstract:Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

* Accepted to ACL 2019

Via

Access Paper or Ask Questions

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

May 24, 2019

Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, Kristina Toutanova

Figure 1 for BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Figure 2 for BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Figure 3 for BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Figure 4 for BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

Abstract:In this paper we study yes/no questions that are naturally occurring --- meaning that they are generated in unprompted and unconstrained settings. We build a reading comprehension dataset, BoolQ, of such questions, and show that they are unexpectedly challenging. They often query for complex, non-factoid information, and require difficult entailment-like inference to solve. We also explore the effectiveness of a range of transfer learning baselines. We find that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT. Our best method trains BERT on MultiNLI and then re-trains it on our train set. It achieves 80.4% accuracy compared to 90% accuracy of human annotators (and 62% majority-baseline), leaving a significant gap for future work.

* In NAACL 2019

Via

Access Paper or Ask Questions

Language Model Pre-training for Hierarchical Document Representations

Jan 26, 2019

Ming-Wei Chang, Kristina Toutanova, Kenton Lee, Jacob Devlin

Figure 1 for Language Model Pre-training for Hierarchical Document Representations

Figure 2 for Language Model Pre-training for Hierarchical Document Representations

Figure 3 for Language Model Pre-training for Hierarchical Document Representations

Figure 4 for Language Model Pre-training for Hierarchical Document Representations

Abstract:Hierarchical neural architectures are often used to capture long-distance dependencies and have been applied to many document-level tasks such as summarization, document segmentation, and sentiment analysis. However, effective usage of such a large context can be difficult to learn, especially in the case where there is limited labeled data available. Building on the recent success of language model pretraining methods for learning flat representations of text, we propose algorithms for pre-training hierarchical document representations from unlabeled data. Unlike prior work, which has focused on pre-training contextual token representations or context-independent {sentence/paragraph} representations, our hierarchical document representations include fixed-length sentence/paragraph representations which integrate contextual information from the entire documents. Experiments on document segmentation, document-level question answering, and extractive document summarization demonstrate the effectiveness of the proposed pre-training algorithms.

Via

Access Paper or Ask Questions

A BERT Baseline for the Natural Questions

Jan 24, 2019

Chris Alberti, Kenton Lee, Michael Collins

Figure 1 for A BERT Baseline for the Natural Questions

Abstract:This technical note describes a new baseline for the Natural Questions. Our model is based on BERT and reduces the gap between the model F1 scores reported in the original dataset paper and the human upper bound by 30% and 50% relative for the long and short answer tasks respectively. This baseline has been submitted to the official NQ leaderboard at ai.google.com/research/NaturalQuestions and we plan to opensource the code for it in the near future.

Via

Access Paper or Ask Questions