Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jason Wei

Language Model Augmented Relevance Score

Aug 19, 2021
Ruibo Liu, Jason Wei, Soroush Vosoughi

Figure 1 for Language Model Augmented Relevance Score

Figure 2 for Language Model Augmented Relevance Score

Figure 3 for Language Model Augmented Relevance Score

Figure 4 for Language Model Augmented Relevance Score

Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.

* In ACL 2021

Via

Access Paper or Ask Questions

Modulating Language Models with Emotions

Aug 17, 2021
Ruibo Liu, Jason Wei, Chenyan Jia, Soroush Vosoughi

Figure 1 for Modulating Language Models with Emotions

Figure 2 for Modulating Language Models with Emotions

Figure 3 for Modulating Language Models with Emotions

Figure 4 for Modulating Language Models with Emotions

Generating context-aware language that embodies diverse emotions is an important step towards building empathetic NLP systems. In this paper, we propose a formulation of modulated layer normalization -- a technique inspired by computer vision -- that allows us to use large-scale language models for emotional response generation. In automatic and human evaluation on the MojiTalk dataset, our proposed modulated layer normalization method outperforms prior baseline methods while maintaining diversity, fluency, and coherence. Our method also obtains competitive performance even when using only 10% of the available training data.

* Findings of ACL 2021

Via

Access Paper or Ask Questions

The MultiBERTs: BERT Reproductions for Robustness Analysis

Jun 30, 2021
Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick

Figure 1 for The MultiBERTs: BERT Reproductions for Robustness Analysis

Figure 2 for The MultiBERTs: BERT Reproductions for Robustness Analysis

Figure 3 for The MultiBERTs: BERT Reproductions for Robustness Analysis

Figure 4 for The MultiBERTs: BERT Reproductions for Robustness Analysis

Experiments with pretrained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact (i.e., the particular instance of the model), it is not always clear whether they hold for the more general procedure (which includes the model architecture, training data, initialization scheme, and loss function). Recent work has shown that re-running pretraining can lead to substantially different conclusions about performance, suggesting that alternative evaluations are needed to make principled statements about procedures. To address this question, we introduce MultiBERTs: a set of 25 BERT-base checkpoints, trained with similar hyper-parameters as the original BERT model but differing in random initialization and data shuffling. The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures. The full release includes 25 fully trained checkpoints, as well as statistical guidelines and a code library implementing our recommended hypothesis testing methods. Finally, for five of these models we release a set of 28 intermediate checkpoints in order to support research on learning dynamics.

* Checkpoints and example analyses: http://goo.gle/multiberts

Via

Access Paper or Ask Questions

A Cognitive Regularizer for Language Modeling

Jun 10, 2021
Jason Wei, Clara Meister, Ryan Cotterell

Figure 1 for A Cognitive Regularizer for Language Modeling

Figure 2 for A Cognitive Regularizer for Language Modeling

Figure 3 for A Cognitive Regularizer for Language Modeling

Figure 4 for A Cognitive Regularizer for Language Modeling

The uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend to distribute information uniformly across a linguistic signal, has gained traction in psycholinguistics as an explanation for certain syntactic, morphological, and prosodic choices. In this work, we explore whether the UID hypothesis can be operationalized as an inductive bias for statistical language modeling. Specifically, we augment the canonical MLE objective for training language models with a regularizer that encodes UID. In experiments on ten languages spanning five language families, we find that using UID regularization consistently improves perplexity in language models, having a larger effect when training data is limited. Moreover, via an analysis of generated sequences, we find that UID-regularized language models have other desirable properties, e.g., they generate text that is more lexically diverse. Our results not only suggest that UID is a reasonable inductive bias for language modeling, but also provide an alternative validation of the UID hypothesis using modern-day NLP tools.

* ACL 2021 Camera-ready (fixed ordering of affiliation emojis)

Via

Access Paper or Ask Questions

A Survey of Data Augmentation Approaches for NLP

May 29, 2021
Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy

Figure 1 for A Survey of Data Augmentation Approaches for NLP

Figure 2 for A Survey of Data Augmentation Approaches for NLP

Figure 3 for A Survey of Data Augmentation Approaches for NLP

Figure 4 for A Survey of Data Augmentation Approaches for NLP

Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP and motivate additional work in this area. We also present a GitHub repository with a paper list that will be continuously updated at https://github.com/styfeng/DataAug4NLP

* Accepted to ACL 2021 Findings. GitHub repo with paper list at https://github.com/styfeng/DataAug4NLP

Via

Access Paper or Ask Questions

Mitigating Political Bias in Language Models Through Reinforced Calibration

Apr 30, 2021
Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Lili Wang, Soroush Vosoughi

Figure 1 for Mitigating Political Bias in Language Models Through Reinforced Calibration

Figure 2 for Mitigating Political Bias in Language Models Through Reinforced Calibration

Figure 3 for Mitigating Political Bias in Language Models Through Reinforced Calibration

Figure 4 for Mitigating Political Bias in Language Models Through Reinforced Calibration

Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real-world settings. In this paper, we describe metrics for measuring political bias in GPT-2 generation and propose a reinforcement learning (RL) framework for mitigating political biases in generated text. By using rewards from word embeddings or a classifier, our RL framework guides debiased generation without having access to the training data or requiring the model to be retrained. In empirical experiments on three attributes sensitive to political bias (gender, location, and topic), our methods reduced bias according to both our metrics and human evaluation, while maintaining readability and semantic coherence.

* In proceedings of the 35th AAAI Conference on Artificial Intelligence

Via

Access Paper or Ask Questions