Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ming-Wei Chang

Joint Passage Ranking for Diverse Multi-Answer Retrieval

Apr 17, 2021

Sewon Min, Kenton Lee, Ming-Wei Chang, Kristina Toutanova, Hannaneh Hajishirzi

Figure 1 for Joint Passage Ranking for Diverse Multi-Answer Retrieval

Figure 2 for Joint Passage Ranking for Diverse Multi-Answer Retrieval

Figure 3 for Joint Passage Ranking for Diverse Multi-Answer Retrieval

Figure 4 for Joint Passage Ranking for Diverse Multi-Answer Retrieval

Abstract:We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a given question. This task requires joint modeling of retrieved passages, as models should not repeatedly retrieve passages containing the same answer at the cost of missing a different valid answer. Prior work focusing on single-answer retrieval is limited as it cannot reason about the set of passages jointly. In this paper, we introduce JPR, a joint passage retrieval model focusing on reranking. To model the joint probability of the retrieved passages, JPR makes use of an autoregressive reranker that selects a sequence of passages, equipped with novel training and decoding algorithms. Compared to prior approaches, JPR achieves significantly better answer coverage on three multi-answer datasets. When combined with downstream question answering, the improved retrieval enables larger answer generation models since they need to consider fewer passages, establishing a new state-of-the-art.

* 13 pages

Via

Access Paper or Ask Questions

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Apr 15, 2021

Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong Pasupat, Yuan Zhang

Figure 1 for Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Figure 2 for Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Figure 3 for Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Figure 4 for Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

Abstract:Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization. While specialized model architectures and pre-training of seq2seq models have been proposed to address this issue, the former often comes at the cost of generality and the latter only shows limited success. In this paper, we study the impact of intermediate representations on compositional generalization in pre-trained seq2seq models, without changing the model architecture at all, and identify key aspects for designing effective representations. Instead of training to directly map natural language to an executable form, we map to a reversible or lossy intermediate representation that has stronger structural correspondence with natural language. The combination of our proposed intermediate representations and pre-trained models is surprisingly effective, where the best combinations obtain a new state-of-the-art on CFQ (+14.8 accuracy points) and on the template-splits of three text-to-SQL datasets (+15.0 to +19.4 accuracy points). This work highlights that intermediate representations provide an important and potentially overlooked degree of freedom for improving the compositional generalization abilities of pre-trained seq2seq models.

Via

Access Paper or Ask Questions

CapWAP: Captioning with a Purpose

Nov 09, 2020

Adam Fisch, Kenton Lee, Ming-Wei Chang, Jonathan H. Clark, Regina Barzilay

Figure 1 for CapWAP: Captioning with a Purpose

Figure 2 for CapWAP: Captioning with a Purpose

Figure 3 for CapWAP: Captioning with a Purpose

Figure 4 for CapWAP: Captioning with a Purpose

Abstract:The traditional image captioning task uses generic reference captions to provide textual information about images. Different user populations, however, will care about different visual aspects of images. In this paper, we propose a new task, Captioning with a Purpose (CapWAP). Our goal is to develop systems that can be tailored to be useful for the information needs of an intended population, rather than merely provide generic information about an image. In this task, we use question-answer (QA) pairs---a natural expression of information need---from users, instead of reference captions, for both training and post-inference evaluation. We show that it is possible to use reinforcement learning to directly optimize for the intended information need, by rewarding outputs that allow a question answering model to provide correct answers to sampled user questions. We convert several visual question answering datasets into CapWAP datasets, and demonstrate that under a variety of scenarios our purposeful captioning system learns to anticipate and fulfill specific information needs better than its generic counterparts, as measured by QA performance on user questions from unseen images, when using the caption alone as context.

* EMNLP 2020

Via

Access Paper or Ask Questions

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Oct 24, 2020

Peter Shaw, Ming-Wei Chang, Panupong Pasupat, Kristina Toutanova

Figure 1 for Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Figure 2 for Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Figure 3 for Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Figure 4 for Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

Abstract:Sequence-to-sequence models excel at handling natural language variation, but have been shown to struggle with out-of-distribution compositional generalization. This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation. In this work we ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? To better assess this capability, we propose new train and test splits of non-synthetic datasets. We demonstrate that strong existing semantic parsing approaches do not yet perform well across a broad set of evaluations. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model. It outperforms existing approaches across several compositional generalization challenges, while also being competitive with the state-of-the-art on standard evaluations. While still far from solving this problem, our study highlights the importance of diverse evaluations and the open challenge of handling both compositional generalization and natural language variation in semantic parsing.

Via

Access Paper or Ask Questions

Open Question Answering over Tables and Text

Oct 20, 2020

Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Wang, William W. Cohen

Figure 1 for Open Question Answering over Tables and Text

Figure 2 for Open Question Answering over Tables and Text

Figure 3 for Open Question Answering over Tables and Text

Figure 4 for Open Question Answering over Tables and Text

Abstract:In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over both tabular and textual data and present a new large-scale dataset Open Table-Text Question Answering (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging---our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use "early fusion" to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidences with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27%.

* Technical Report

Via

Access Paper or Ask Questions

Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

May 05, 2020

Hao Cheng, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Figure 1 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Figure 2 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Figure 3 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Figure 4 for Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering

Abstract:We address the problem of extractive question answering using document-level distant super-vision, pairing questions and relevant documents with answer strings. We compare previously used probability space and distant super-vision assumptions (assumptions on the correspondence between the weak answer string labels and possible answer mention spans). We show that these assumptions interact, and that different configurations provide complementary benefits. We demonstrate that a multi-objective model can efficiently combine the advantages of multiple assumptions and out-perform the best individual formulation. Our approach outperforms previous state-of-the-art models by 4.3 points in F1 on TriviaQA-Wiki and 1.7 points in Rouge-L on NarrativeQA summaries.

* ACL2020

Via

Access Paper or Ask Questions

REALM: Retrieval-Augmented Language Model Pre-Training

Feb 10, 2020

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang

Figure 1 for REALM: Retrieval-Augmented Language Model Pre-Training

Figure 2 for REALM: Retrieval-Augmented Language Model Pre-Training

Figure 3 for REALM: Retrieval-Augmented Language Model Pre-Training

Figure 4 for REALM: Retrieval-Augmented Language Model Pre-Training

Abstract:Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

Via

Access Paper or Ask Questions

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Sep 25, 2019

Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Figure 1 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Figure 2 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Figure 3 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Figure 4 for Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Abstract:Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to down-stream tasks, several model compression techniques on pre-trained language representations have been proposed (Sun et al., 2019; Sanh, 2019). However, surprisingly, the simple baseline of just pre-training and fine-tuning compact models has been overlooked. In this paper, we first show that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work. Starting with pre-trained compact models, we then explore transferring task knowledge from large fine-tuned models through standard knowledge distillation. The resulting simple, yet effective and general algorithm, Pre-trained Distillation, brings further improvements. Through extensive experiments, we more generally explore the interaction between pre-training and distillation under two variables that have been under-studied: model size and properties of unlabeled task data. One surprising observation is that they have a compound effect even when sequentially applied on the same data. To accelerate future research, we will make our 24 pre-trained miniature BERT models publicly available.

* Added comparison to concurrent work

Via

Access Paper or Ask Questions

Zero-Shot Entity Linking by Reading Entity Descriptions

Jun 18, 2019

Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee

Figure 1 for Zero-Shot Entity Linking by Reading Entity Descriptions

Figure 2 for Zero-Shot Entity Linking by Reading Entity Descriptions

Figure 3 for Zero-Shot Entity Linking by Reading Entity Descriptions

Figure 4 for Zero-Shot Entity Linking by Reading Entity Descriptions

Abstract:We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data. The goal is to enable robust transfer to highly specialized domains, and so no metadata or alias tables are assumed. In this setting, entities are only identified by text descriptions, and models must rely strictly on language understanding to resolve the new entities. First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities. Second, we propose a simple and effective adaptive pre-training strategy, which we term domain-adaptive pre-training (DAP), to address the domain shift problem associated with linking unseen entities in a new domain. We present experiments on a new dataset that we construct for this task and show that DAP improves over strong pre-training baselines, including BERT. The data and code are available at https://github.com/lajanugen/zeshel.

* ACL 2019

Via

Access Paper or Ask Questions

Latent Retrieval for Weakly Supervised Open Domain Question Answering

Jun 06, 2019

Kenton Lee, Ming-Wei Chang, Kristina Toutanova

Figure 1 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Figure 2 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Figure 3 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Figure 4 for Latent Retrieval for Weakly Supervised Open Domain Question Answering

Abstract:Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

* Accepted to ACL 2019

Via

Access Paper or Ask Questions