Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Riedel

Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

Jul 05, 2021

Yuxiang Wu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

Figure 1 for Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

Figure 2 for Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

Figure 3 for Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

Figure 4 for Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

Abstract:Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can be applied to an existing ODQA model and can be trained efficiently on a single GPU. It keeps the parameters of the base ODQA model fixed, but it overrides the default layer-by-layer computation of the encoder with an AC policy that is trained to optimise the computational efficiency of the model. Our experimental results show that our method improves upon a state-of-the-art model on two datasets, and is also more accurate than previous AC methods due to the stronger base ODQA model. All source code and datasets are available at https://github.com/uclnlp/APE.

* 7 pages, 1 figure, to be published in ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Jul 01, 2021

Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel

Figure 1 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Figure 2 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Figure 3 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Figure 4 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Abstract:Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competitive accuracy to manually-tuned prompts across a wide range of tasks. While finetuning LMs does introduce new parameters for each downstream task, we show that this memory overhead can be substantially reduced: finetuning only the bias terms can achieve comparable or better accuracy than standard finetuning while only updating 0.1% of the parameters. All in all, we recommend finetuning LMs for few-shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.

Via

Access Paper or Ask Questions

Database Reasoning Over Text

Jun 02, 2021

James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy

Figure 1 for Database Reasoning Over Text

Figure 2 for Database Reasoning Over Text

Figure 3 for Database Reasoning Over Text

Figure 4 for Database Reasoning Over Text

Abstract:Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer models perform very well for small databases, they exhibit limitations in processing noisy data, numerical operations, and queries that aggregate facts. We propose a modular architecture to answer these database-style queries over multiple spans from text and aggregating these at scale. We evaluate the architecture using WikiNLDB, a novel dataset for exploring such queries. Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded. In direct comparison on small databases, our approach increases overall answer accuracy from 85% to 90%. On larger databases, our approach retains its accuracy whereas transformer baselines could not encode the context.

* To appear at ACL2021

Via

Access Paper or Ask Questions

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Apr 18, 2021

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, Pontus Stenetorp

Figure 1 for Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Figure 2 for Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Figure 3 for Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Figure 4 for Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

Abstract:When primed with only a handful of training samples, very large pretrained language models such as GPT-3, have shown competitive results when compared to fully-supervised fine-tuned large pretrained language models. We demonstrate that the order in which the samples are provided can be the difference between near state-of-the-art and random guess performance: Essentially some permutations are "fantastic" and some not. We analyse this phenomenon in detail, establishing that: it is present across model sizes (even for the largest current models), it is not related to a specific subset of samples, and that a given good permutation for one model is not transferable to another. While one could use a development set to determine which permutations are performant, this would deviate from the few-shot setting as it requires additional annotated data. Instead, we use the generative nature of the language models to construct an artificial development set and based on entropy statistics of the candidate permutations from this set we identify performant prompts. Our method improves upon GPT-family models by on average 13% relative across eleven different established text classification tasks.

Via

Access Paper or Ask Questions

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Apr 18, 2021

Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela

Figure 1 for Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Figure 2 for Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Figure 3 for Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Figure 4 for Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Abstract:Despite the availability of very large datasets and pretrained models, state-of-the-art question answering models remain susceptible to a variety of adversarial attacks and are still far from obtaining human-level language understanding. One proposed way forward is dynamic adversarial data collection, in which a human annotator attempts to create examples for which a model-in-the-loop fails. However, this approach comes at a higher cost per sample and slower pace of annotation, as model-adversarial data requires more annotator effort to generate. In this work, we investigate several answer selection, question generation, and filtering methods that form a synthetic adversarial data generation pipeline that takes human-generated adversarial samples and unannotated text to create synthetic question-answer pairs. Models trained on both synthetic and human-generated data outperform models not trained on synthetic adversarial data, and obtain state-of-the-art results on the AdversarialQA dataset with overall performance gains of 3.7F1. Furthermore, we find that training on the synthetic adversarial data improves model generalisation across domains for non-adversarial data, demonstrating gains on 9 of the 12 datasets for MRQA. Lastly, we find that our models become considerably more difficult to beat by human adversaries, with a drop in macro-averaged validated model error rate from 17.6% to 8.8% when compared to non-augmented models.

Via

Access Paper or Ask Questions

Dynabench: Rethinking Benchmarking in NLP

Apr 07, 2021

Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia(+9 more)

Figure 1 for Dynabench: Rethinking Benchmarking in NLP

Figure 2 for Dynabench: Rethinking Benchmarking in NLP

Figure 3 for Dynabench: Rethinking Benchmarking in NLP

Abstract:We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.

* NAACL 2021

Via

Access Paper or Ask Questions

Multilingual Autoregressive Entity Linking

Mar 23, 2021

Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

Figure 1 for Multilingual Autoregressive Entity Linking

Figure 2 for Multilingual Autoregressive Entity Linking

Figure 3 for Multilingual Autoregressive Entity Linking

Figure 4 for Multilingual Autoregressive Entity Linking

Abstract:We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50% improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where mGENRE establishes new state-of-the-art results. Code and pre-trained models at https://github.com/facebookresearch/GENRE.

* 20 pages, 8 figures, and 11 tables

Via

Access Paper or Ask Questions

Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

Mar 10, 2021

Yunlei Shi, Zhaopeng Chen, Yansong Wu, Dimitri Henkel, Sebastian Riedel, Hongxu Liu, Qian Feng, Jianwei Zhang

Figure 1 for Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

Figure 2 for Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

Figure 3 for Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

Figure 4 for Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

Abstract:Collaborative robots are expected to be able to work alongside humans and in some cases directly replace existing human workers, thus effectively responding to rapid assembly line changes. Current methods for programming contact-rich tasks, especially in heavily constrained space, tend to be fairly inefficient. Therefore, faster and more intuitive approaches to robot teaching are urgently required. This work focuses on combining visual servoing based learning from demonstration (LfD) and force-based learning by exploration (LbE), to enable fast and intuitive programming of contact-rich tasks with minimal user effort required. Two learning approaches were developed and integrated into a framework, and one relying on human to robot motion mapping (the visual servoing approach) and one on force-based reinforcement learning. The developed framework implements the non-contact demonstration teaching method based on visual servoing approach and optimizes the demonstrated robot target positions according to the detected contact state. The framework has been compared with two most commonly used baseline techniques, pendant-based teaching and hand-guiding teaching. The efficiency and reliability of the framework have been validated through comparison experiments involving the teaching and execution of contact-rich tasks. The framework proposed in this paper has performed the best in terms of teaching time, execution success rate, risk of damage, and ease of use.

* 8 pages, 7 figures, IROS2021 submission

Via

Access Paper or Ask Questions

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Feb 13, 2021

Patrick Lewis, Yuxiang Wu, Linqing Liu, Pasquale Minervini, Heinrich Küttler, Aleksandra Piktus, Pontus Stenetorp, Sebastian Riedel

Figure 1 for PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Figure 2 for PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Figure 3 for PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Figure 4 for PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Abstract:Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) whilst retaining high accuracy. Lastly, we demonstrate RePAQ's strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to ``back-off" to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.

Via

Access Paper or Ask Questions

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Jan 01, 2021

Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki(+43 more)

Figure 1 for NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Figure 2 for NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Figure 3 for NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Figure 4 for NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Abstract:We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing large, redundant, retrieval corpora or the parameters of large learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA.

* 26 pages

Via

Access Paper or Ask Questions