Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Apr 24, 2020

Samuel R. Bowman, Jennimaria Palomaki, Livio Baldini Soares, Emily Pitler

Figure 1 for Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Figure 2 for Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Figure 3 for Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Figure 4 for Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Share this with someone who'll enjoy it:

Abstract:Textual entailment (or NLI) data has proven useful as pretraining data for tasks requiring language understanding, even when building on an already-pretrained model like RoBERTa. The standard protocol for collecting NLI was not designed for the creation of pretraining data, and it is likely far from ideal for this purpose. With this application in mind, we propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a simple MNLI-based baseline, we collect and compare five new 8.5k-example training sets. Our primary results are solidly negative, with our baseline MNLI-style dataset yielding good transfer performance, but none of our four new methods (nor the recent ANLI) showing any improvements on that baseline. However, we do observe that all four of these interventions, especially the use of seed sentences for inspiration, reduce previously observed issues with annotation artifacts.

View paper on

Share this with someone who'll enjoy it:

Title:Collecting Entailment Data for Pretraining: New Protocols and Negative Results

Paper and Code