Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingfei Du

Self-training Improves Pre-training for Natural Language Understanding

Oct 05, 2020

Jingfei Du, Edouard Grave, Beliz Gunel, Vishrav Chaudhary, Onur Celebi, Michael Auli, Ves Stoyanov, Alexis Conneau

Figure 1 for Self-training Improves Pre-training for Natural Language Understanding

Figure 2 for Self-training Improves Pre-training for Natural Language Understanding

Figure 3 for Self-training Improves Pre-training for Natural Language Understanding

Figure 4 for Self-training Improves Pre-training for Natural Language Understanding

Abstract:Unsupervised pre-training has led to much recent progress in natural language understanding. In this paper, we study self-training as another way to leverage unlabeled data through semi-supervised learning. To obtain additional data for a specific task, we introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web. Unlike previous semi-supervised methods, our approach does not require in-domain unlabeled data and is therefore more generally applicable. Experiments show that self-training is complementary to strong RoBERTa baselines on a variety of tasks. Our augmentation approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks. Finally, we also show strong gains on knowledge-distillation and few-shot learning.

* 8 pages

Via

Access Paper or Ask Questions

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Sep 27, 2020

Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, Jingfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Wen-tau Yih, Sebastian Riedel, Douwe Kiela(+1 more)

Figure 1 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Figure 2 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Figure 3 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Figure 4 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Abstract:We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.

Via

Access Paper or Ask Questions

General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

Apr 29, 2020

Jingfei Du, Myle Ott, Haoran Li, Xing Zhou, Veselin Stoyanov

Figure 1 for General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

Figure 2 for General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

Figure 3 for General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

Figure 4 for General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

Abstract:The state of the art on many NLP tasks is currently achieved by large pre-trained language models, which require a considerable amount of computation. We explore a setting where many different predictions are made on a single piece of text. In that case, some of the computational cost during inference can be amortized over the different tasks using a shared text encoder. We compare approaches for training such an encoder and show that encoders pre-trained over multiple tasks generalize well to unseen tasks. We also compare ways of extracting fixed- and limited-size representations from this encoder, including different ways of pooling features extracted from multiple layers or positions. Our best approach compares favorably to knowledge distillation, achieving higher accuracy and lower computational cost once the system is handling around 7 tasks. Further, we show that through binary quantization, we can reduce the size of the extracted representations by a factor of 16 making it feasible to store them for later use. The resulting method offers a compelling solution for using large-scale pre-trained models at a fraction of the computational cost when multiple tasks are performed on the same text.

Via

Access Paper or Ask Questions

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Dec 20, 2019

Wenhan Xiong, Jingfei Du, William Yang Wang, Veselin Stoyanov

Figure 1 for Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Figure 2 for Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Figure 3 for Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Figure 4 for Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Abstract:Recent breakthroughs of pretrained language models have shown the effectiveness of self-supervised learning for a wide range of natural language processing (NLP) tasks. In addition to standard syntactic and semantic NLP tasks, pretrained models achieve strong improvements on tasks that involve real-world knowledge, suggesting that large-scale language modeling could be an implicit method to capture knowledge. In this work, we further investigate the extent to which pretrained models such as BERT capture knowledge using a zero-shot fact completion task. Moreover, we propose a simple yet effective weakly supervised pretraining objective, which explicitly forces the model to incorporate knowledge about real-world entities. Models trained with our new objective yield significant improvements on the fact completion task. When applied to downstream tasks, our model consistently outperforms BERT on four entity-related question answering datasets (i.e., WebQuestions, TriviaQA, SearchQA and Quasar-T) with an average 2.7 F1 improvements and a standard fine-grained entity typing dataset (i.e., FIGER) with 5.7 accuracy gains.

* Accepted to ICLR 2020

Via

Access Paper or Ask Questions

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Jul 26, 2019

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Figure 1 for RoBERTa: A Robustly Optimized BERT Pretraining Approach

Figure 2 for RoBERTa: A Robustly Optimized BERT Pretraining Approach

Figure 3 for RoBERTa: A Robustly Optimized BERT Pretraining Approach

Figure 4 for RoBERTa: A Robustly Optimized BERT Pretraining Approach

Abstract:Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

Via

Access Paper or Ask Questions

Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Apr 09, 2019

Angli Liu, Jingfei Du, Veselin Stoyanov

Figure 1 for Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Figure 2 for Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Figure 3 for Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Figure 4 for Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Abstract:Traditional language models are unable to efficiently model entity names observed in text. All but the most popular named entities appear infrequently in text providing insufficient context. Recent efforts have recognized that context can be generalized between entity names that share the same type (e.g., \emph{person} or \emph{location}) and have equipped language models with access to an external knowledge base (KB). Our Knowledge-Augmented Language Model (KALM) continues this line of work by augmenting a traditional model with a KB. Unlike previous methods, however, we train with an end-to-end predictive objective optimizing the perplexity of text. We do not require any additional information such as named entity tags. In addition to improving language modeling performance, KALM learns to recognize named entities in an entirely unsupervised way by using entity type information latent in the model. On a Named Entity Recognition (NER) task, KALM achieves performance comparable with state-of-the-art supervised models. Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information.

* NAACL 2019

Via

Access Paper or Ask Questions