Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ryan Cotterell

ETH Zurich

Activation Scaling for Steering and Interpreting Language Models

Oct 07, 2024

Niklas Stoehr, Kevin Du, Vésteinn Snæbjarnarson, Robert West, Ryan Cotterell, Aaron Schein

Abstract:Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars? We argue that successfully intervening on a model is a prerequisite for interpreting its internal workings. Concretely, we establish a three-term objective: a successful intervention should flip the correct with the wrong token and vice versa (effectiveness), and leave other tokens unaffected (faithfulness), all while being sparse (minimality). Using gradient-based optimization, this objective lets us learn (and later evaluate) a specific kind of efficient and interpretable intervention: activation scaling only modifies the signed magnitude of activation vectors to strengthen, weaken, or reverse the steering directions already encoded in the model. On synthetic tasks, this intervention performs comparably with steering vectors in terms of effectiveness and faithfulness, but is much more minimal allowing us to pinpoint interpretable model components. We evaluate activation scaling from different angles, compare performance on different datasets, and make activation scalars a learnable function of the activation vectors themselves to generalize to varying-length prompts.

* Findings of the Association for Computational Linguistics: EMNLP 2024

Via

Access Paper or Ask Questions

On the Proper Treatment of Tokenization in Psycholinguistics

Oct 03, 2024

Mario Giulianelli, Luca Malagutti, Juan Luis Gastaldi, Brian DuSell, Tim Vieira, Ryan Cotterell

Figure 1 for On the Proper Treatment of Tokenization in Psycholinguistics

Figure 2 for On the Proper Treatment of Tokenization in Psycholinguistics

Figure 3 for On the Proper Treatment of Tokenization in Psycholinguistics

Figure 4 for On the Proper Treatment of Tokenization in Psycholinguistics

Abstract:Language models are widely used in computational psycholinguistics to test theories that relate the negative log probability (the surprisal) of a region of interest (a substring of characters) under a language model to its cognitive cost experienced by readers, as operationalized, for example, by gaze duration on the region. However, the application of modern language models to psycholinguistic studies is complicated by the practice of using tokenization as an intermediate step in training a model. Doing so results in a language model over token strings rather than one over character strings. Vexingly, regions of interest are generally misaligned with these token strings. The paper argues that token-level language models should be (approximately) marginalized into character-level language models before they are used in psycholinguistic studies to compute the surprisal of a region of interest; then, the marginalized character-level language model can be used to compute the surprisal of an arbitrary character substring, which we term a focal area, that the experimenter may wish to use as a predictor. Our proposal of marginalizing a token-level model into a character-level one solves this misalignment issue independently of the tokenization scheme. Empirically, we discover various focal areas whose surprisal is a better psychometric predictor than the surprisal of the region of interest itself.

* Main conference long paper at EMNLP 2024

Via

Access Paper or Ask Questions

Can Transformers Learn $n$-gram Language Models?

Oct 03, 2024

Anej Svete, Nadav Borenstein, Mike Zhou, Isabelle Augenstein, Ryan Cotterell

Figure 1 for Can Transformers Learn $n$-gram Language Models?

Figure 2 for Can Transformers Learn $n$-gram Language Models?

Figure 3 for Can Transformers Learn $n$-gram Language Models?

Figure 4 for Can Transformers Learn $n$-gram Language Models?

Abstract:Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning algorithm, and training data. To test whether theoretical lower bounds imply \emph{learnability} of formal languages, we turn to recent work relating transformers to $n$-gram language models (LMs). We study transformers' ability to learn random $n$-gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where those are defined with shared parameters. We find that classic estimation techniques for $n$-gram LMs such as add-$\lambda$ smoothing outperform transformers on the former, while transformers perform better on the latter, outperforming methods specifically designed to learn $n$-gram LMs.

Via

Access Paper or Ask Questions

Generalized Measures of Anticipation and Responsivity in Online Language Processing

Sep 16, 2024

Mario Giulianelli, Andreas Opedal, Ryan Cotterell

Figure 1 for Generalized Measures of Anticipation and Responsivity in Online Language Processing

Figure 2 for Generalized Measures of Anticipation and Responsivity in Online Language Processing

Figure 3 for Generalized Measures of Anticipation and Responsivity in Online Language Processing

Figure 4 for Generalized Measures of Anticipation and Responsivity in Online Language Processing

Abstract:We introduce a generalization of classic information-theoretic measures of predictive uncertainty in online language processing, based on the simulation of expected continuations of incremental linguistic contexts. Our framework provides a formal definition of anticipatory and responsive measures, and it equips experimenters with the tools to define new, more expressive measures beyond standard next-symbol entropy and surprisal. While extracting these standard quantities from language models is convenient, we demonstrate that using Monte Carlo simulation to estimate alternative responsive and anticipatory measures pays off empirically: New special cases of our generalized formula exhibit enhanced predictive power compared to surprisal for human cloze completion probability as well as ELAN, LAN, and N400 amplitudes, and greater complementarity with surprisal in predicting reading times.

Via

Access Paper or Ask Questions

On the Role of Context in Reading Time Prediction

Sep 12, 2024

Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox

Figure 1 for On the Role of Context in Reading Time Prediction

Figure 2 for On the Role of Context in Reading Time Prediction

Figure 3 for On the Role of Context in Reading Time Prediction

Figure 4 for On the Role of Context in Reading Time Prediction

Abstract:We present a new perspective on how readers integrate context during real-time language comprehension. Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit (e.g., a word) is an affine function of its in-context information content. We first observe that surprisal is only one out of many potential ways that a contextual predictor can be derived from a language model. Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency. Moreover, both PMI and surprisal are correlated with frequency. This means that neither PMI nor surprisal contains information about context alone. In response to this, we propose a technique where we project surprisal onto the orthogonal complement of frequency, yielding a new contextual predictor that is uncorrelated with frequency. Our experiments show that the proportion of variance in reading times explained by context is a lot smaller when context is represented by the orthogonalized predictor. From an interpretability standpoint, this indicates that previous studies may have overstated the role that context has in predicting reading times.

Via

Access Paper or Ask Questions

Do Language Models Have a Critical Period for Language Acquisition?

Jul 27, 2024

Ionut Constantinescu, Tiago Pimentel, Ryan Cotterell, Alex Warstadt

Abstract:Humans appear to have a critical period (CP) for language acquisition: Second language (L2) acquisition becomes harder after early childhood, and ceasing exposure to a first language (L1) after this period (but not before) typically does not lead to substantial loss of L1 proficiency. It is unknown whether these CP effects result from innately determined brain maturation or as a stabilization of neural connections naturally induced by experience. In this study, we use language models (LMs) to test the extent to which these phenomena are peculiar to humans, or shared by a broader class of language learners. We vary the age of exposure by training LMs on language pairs in various experimental conditions, and find that LMs, which lack any direct analog to innate maturational stages, do not show CP effects when trained sequentially on L1 and L2. Our results contradict the claim that CP effects are an inevitable result of learning in statistical learners, and they are consistent with an innate mechanism for CP effects. We show that we can reverse-engineer the CP by introducing a regularizer partway through training to simulate a maturational decrease in plasticity. All in all, our results suggest that L1 learning on its own may not be enough to induce a CP, and additional engineering is necessary to make language models more cognitively plausible.

Via

Access Paper or Ask Questions

The Foundations of Tokenization: Statistical and Computational Concerns

Jul 16, 2024

Juan Luis Gastaldi, John Terilla, Luca Malagutti, Brian DuSell, Tim Vieira, Ryan Cotterell

Abstract:Tokenization - the practice of converting strings of characters over an alphabet into sequences of tokens over a vocabulary - is a critical yet under-theorized step in the NLP pipeline. Notably, it remains the only major step not fully integrated into widely used end-to-end neural models. This paper aims to address this theoretical gap by laying the foundations of tokenization from a formal perspective. By articulating and extending basic properties about the category of stochastic maps, we propose a unified framework for representing and analyzing tokenizer models. This framework allows us to establish general conditions for the use of tokenizers. In particular, we formally establish the necessary and sufficient conditions for a tokenizer model to preserve the consistency of statistical estimators. Additionally, we discuss statistical and computational concerns crucial for the design and implementation of tokenizer models. The framework and results advanced in this paper represent a step toward a robust theoretical foundation for neural language modeling.

Via

Access Paper or Ask Questions

Variational Best-of-N Alignment

Jul 08, 2024

Afra Amini, Tim Vieira, Ryan Cotterell

Figure 1 for Variational Best-of-N Alignment

Figure 2 for Variational Best-of-N Alignment

Figure 3 for Variational Best-of-N Alignment

Figure 4 for Variational Best-of-N Alignment

Abstract:Best-of-N (BoN) is a popular and effective algorithm for aligning language models to human preferences. The algorithm works as follows: at inference time, N samples are drawn from the language model, and the sample with the highest reward, as judged by a reward model, is returned as the output. Despite its effectiveness, BoN is computationally expensive; it reduces sampling throughput by a factor of N. To make BoN more efficient at inference time, one strategy is to fine-tune the language model to mimic what BoN does during inference. To achieve this, we derive the distribution induced by the BoN algorithm. We then propose to fine-tune the language model to minimize backward KL divergence to the BoN distribution. Our approach is analogous to mean-field variational inference and, thus, we term it variational BoN (vBoN). To the extent this fine-tuning is successful and we end up with a good approximation, we have reduced the inference cost by a factor of N. Our experiments on a controlled generation task suggest that while variational BoN is not as effective as BoN in aligning language models, it is close to BoN performance as vBoN appears more often on the Pareto frontier of reward and KL divergence compared to models trained with KL-constrained RL objective.

Via

Access Paper or Ask Questions

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

Jun 20, 2024

Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell

Abstract:The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

* To be published at ACL 2024

Via

Access Paper or Ask Questions

A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Jun 14, 2024

Naaman Tan, Josef Valvoda, Anej Svete, Tianyu Liu, Yanxia Qin, Kan Min-Yen, Ryan Cotterell

Figure 1 for A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Figure 2 for A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Figure 3 for A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Figure 4 for A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Abstract:The relationship between the quality of a string and its probability $p(\boldsymbol{y})$ under a language model has been influential in the development of techniques to build good text generation systems. For example, several decoding algorithms have been motivated to manipulate $p(\boldsymbol{y})$ to produce higher-quality text. In this work, we examine the probability--quality relationship in language models explicitly aligned to human preferences, e.g., through Reinforcement Learning through Human Feedback (RLHF). We find that, given a general language model and its aligned version, for corpora sampled from an aligned language model, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model. We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.

Via

Access Paper or Ask Questions