Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tal Linzen

Testing learning hypotheses using neural networks by manipulating learning data

Jul 05, 2024

Cara Su-Yi Leong, Tal Linzen

Abstract:Although passivization is productive in English, it is not completely general -- some exceptions exist (e.g. *One hour was lasted by the meeting). How do English speakers learn these exceptions to an otherwise general pattern? Using neural network language models as theories of acquisition, we explore the sources of indirect evidence that a learner can leverage to learn whether a verb can passivize. We first characterize English speakers' judgments of exceptions to the passive, confirming that speakers find some verbs more passivizable than others. We then show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans, suggesting that evidence for these exceptions is available in the linguistic input. We test the causal role of two hypotheses for how the language model learns these restrictions by training models on modified training corpora, which we create by altering the existing training corpora to remove features of the input implicated by each hypothesis. We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not. This study highlight the utility of altering a language model's training data for answering questions where complete control over a learner's input is vital.

* Submitted to Journal of Memory and Language

Via

Access Paper or Ask Questions

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Apr 09, 2024

Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

Figure 1 for [Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Abstract:After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge.

Via

Access Paper or Ask Questions

SPAWNing Structural Priming Predictions from a Cognitively Motivated Parser

Mar 11, 2024

Grusha Prasad, Tal Linzen

Figure 1 for SPAWNing Structural Priming Predictions from a Cognitively Motivated Parser

Figure 2 for SPAWNing Structural Priming Predictions from a Cognitively Motivated Parser

Figure 3 for SPAWNing Structural Priming Predictions from a Cognitively Motivated Parser

Figure 4 for SPAWNing Structural Priming Predictions from a Cognitively Motivated Parser

Abstract:Structural priming is a widely used psycholinguistic paradigm to study human sentence representations. In this work we propose a framework for using empirical priming patterns to build a theory characterizing the structural representations humans construct when processing sentences. This framework uses a new cognitively motivated parser, SPAWN, to generate quantitative priming predictions from theoretical syntax and evaluate these predictions with empirical human behavior. As a case study, we apply this framework to study reduced relative clause representations in English. We use SPAWN to generate priming predictions from two theoretical accounts which make different assumptions about the structure of relative clauses. We find that the predictions from only one of these theories (Participial-Phase) align with empirical priming patterns, thus highlighting which assumptions about relative clause better capture human sentence representations.

Via

Access Paper or Ask Questions

Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

Feb 29, 2024

William Merrill, Zhaofeng Wu, Norihito Naka, Yoon Kim, Tal Linzen

Figure 1 for Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

Figure 2 for Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

Figure 3 for Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

Figure 4 for Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

Abstract:Do LMs infer the semantics of text from co-occurrence patterns in their training data? Merrill et al. (2022) argue that, in theory, probabilities predicted by an optimal LM encode semantic information about entailment relations, but it is unclear whether neural LMs trained on corpora learn entailment in this way because of strong idealizing assumptions made by Merrill et al. In this work, we investigate whether their theory can be used to decode entailment judgments from neural LMs. We find that a test similar to theirs can decode entailment relations between natural sentences, well above random chance, though not perfectly, across many datasets and LMs. This suggests LMs implicitly model aspects of semantics to predict semantic effects on sentence co-occurrence patterns. However, we find the test that predicts entailment in practice works in the opposite direction to the theoretical test. We thus revisit the assumptions underlying the original test, finding its derivation did not adequately account for redundancy in human-written text. We argue that correctly accounting for redundancy related to explanations might derive the observed flipped test and, more generally, improve linguistic theories of human speakers.

* Preprint

Via

Access Paper or Ask Questions

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Nov 13, 2023

Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen

Figure 1 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Figure 2 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Figure 3 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Figure 4 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Abstract:In-context learning (ICL) is now a common method for supervising large language models (LLMs): given labeled examples in the input context, the LLM learns to perform the task without weight updates. Despite ICL's prevalence and utility, we understand little about whether models supervised in this manner represent the underlying structure of their tasks, rather than superficial heuristics that only generalize to identically distributed examples. In this study, we investigate the robustness of LLMs supervised via ICL using the test case of sensitivity to syntax, which is a prerequisite for robust language understanding. Our experiments are based on two simple and well-controlled syntactic transformations tasks, where correct out-of-distribution generalization requires an accurate syntactic analysis of the input. We further investigate whether out-of-distribution generalization can be improved via chain-of-thought prompting, where the model is provided with a sequence of intermediate computation steps that illustrate how the task ought to be performed. In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs on this fundamental linguistic phenomenon, and that the variance is explained more by the composition of the pre-training corpus and supervision methods than by model size. In particular, we find evidence that models pre-trained on code generalize better, and benefit to a greater extent from chain-of-thought prompting.

Via

Access Paper or Ask Questions

A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

Nov 01, 2023

Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen

Figure 1 for A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

Figure 2 for A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

Figure 3 for A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

Figure 4 for A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

Abstract:A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate these biases, or are they able to overcome them? Focusing on the case of syllogisms -- inferences from two simple premises, which have been studied extensively in psychology -- we show that larger models are more logical than smaller ones, and also more logical than humans. At the same time, even the largest models make systematic errors, some of which mirror human reasoning biases such as ordering effects and logical fallacies. Overall, we find that language models mimic the human biases included in their training data, but are able to overcome them in some cases.

Via

Access Paper or Ask Questions

The Impact of Depth and Width on Transformer Language Model Generalization

Oct 30, 2023

Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

Figure 1 for The Impact of Depth and Width on Transformer Language Model Generalization

Figure 2 for The Impact of Depth and Width on Transformer Language Model Generalization

Figure 3 for The Impact of Depth and Width on Transformer Language Model Generalization

Figure 4 for The Impact of Depth and Width on Transformer Language Model Generalization

Abstract:To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by recent theoretical and empirical work, that transformers generalize more compositionally when they are deeper (have more layers). Because simply adding layers increases the total number of parameters, confounding depth and size, we construct three classes of models which trade off depth for width such that the total number of parameters is kept constant (41M, 134M and 374M parameters). We pretrain all models as LMs and fine-tune them on tasks that test for compositional generalization. We report three main conclusions: (1) after fine-tuning, deeper models generalize better out-of-distribution than shallower models do, but the relative benefit of additional layers diminishes rapidly; (2) within each family, deeper models show better language modeling performance, but returns are similarly diminishing; (3) the benefits of depth for compositional generalization cannot be attributed solely to better performance on language modeling or on in-distribution data.

Via

Access Paper or Ask Questions

A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

Oct 24, 2023

William Timkey, Tal Linzen

Figure 1 for A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

Figure 2 for A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

Figure 3 for A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

Figure 4 for A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

Abstract:Two of the central factors believed to underpin human sentence processing difficulty are expectations and retrieval from working memory. A recent attempt to create a unified cognitive model integrating these two factors relied on the parallels between the self-attention mechanism of transformer language models and cue-based retrieval theories of working memory in human sentence processing (Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in specialized attention heads of GPT-2 are consistent with similarity-based interference, a key prediction of cue-based retrieval models, their method requires identifying syntactically specialized attention heads, and makes the cognitively implausible assumption that hundreds of memory retrieval operations take place in parallel. In the present work, we develop a recurrent neural language model with a single self-attention head, which more closely parallels the memory system assumed by cognitive theories. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.

* To appear in Findings of the Association for Computational Linguistics: EMNLP 2023

Via

Access Paper or Ask Questions

SLOG: A Structural Generalization Benchmark for Semantic Parsing

Oct 23, 2023

Bingzhi Li, Lucia Donatelli, Alexander Koller, Tal Linzen, Yuekun Yao, Najoung Kim

Abstract:The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions. Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training; structural generalization tasks, where a model needs to interpret syntactic structures that are themselves unfamiliar from training, are often underrepresented, resulting in overly optimistic perceptions of how well models can generalize. We introduce SLOG, a semantic parsing dataset that extends COGS (Kim and Linzen, 2020) with 17 structural generalization cases. In our experiments, the generalization accuracy of Transformer models, including pretrained ones, only reaches 40.6%, while a structure-aware parser only achieves 70.8%. These results are far from the near-perfect accuracy existing models achieve on COGS, demonstrating the role of SLOG in foregrounding the large discrepancy between models' lexical and structural generalization capacities.

* Accepted to EMNLP 2023

Via

Access Paper or Ask Questions

Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number

Oct 23, 2023

Sophie Hao, Tal Linzen

Abstract:Deep architectures such as Transformers are sometimes criticized for having uninterpretable "black-box" representations. We use causal intervention analysis to show that, in fact, some linguistic features are represented in a linear, interpretable format. Specifically, we show that BERT's ability to conjugate verbs relies on a linear encoding of subject number that can be manipulated with predictable effects on conjugation accuracy. This encoding is found in the subject position at the first layer and the verb position at the last layer, but distributed across positions at middle layers, particularly when there are multiple cues to subject number.

* To appear in Findings of the Association for Computational Linguistics: EMNLP 2023

Via

Access Paper or Ask Questions