Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yair Lakretz

UNICOG-U992

A Neural Model for Word Repetition

Jun 16, 2025

Daniel Dager, Robin Sobczyk, Emmanuel Chemla, Yair Lakretz

Abstract:It takes several years for the developing brain of a baby to fully master word repetition-the task of hearing a word and repeating it aloud. Repeating a new word, such as from a new language, can be a challenging task also for adults. Additionally, brain damage, such as from a stroke, may lead to systematic speech errors with specific characteristics dependent on the location of the brain damage. Cognitive sciences suggest a model with various components for the different processing stages involved in word repetition. While some studies have begun to localize the corresponding regions in the brain, the neural mechanisms and how exactly the brain performs word repetition remain largely unknown. We propose to bridge the gap between the cognitive model of word repetition and neural mechanisms in the human brain by modeling the task using deep neural networks. Neural models are fully observable, allowing us to study the detailed mechanisms in their various substructures and make comparisons with human behavior and, ultimately, the brain. Here, we make first steps in this direction by: (1) training a large set of models to simulate the word repetition task; (2) creating a battery of tests to probe the models for known effects from behavioral studies in humans, and (3) simulating brain damage through ablation studies, where we systematically remove neurons from the model, and repeat the behavioral study to examine the resulting speech errors in the "patient" model. Our results show that neural models can mimic several effects known from human research, but might diverge in other aspects, highlighting both the potential and the challenges for future research aimed at developing human-like neural models.

* To appear at Cognitive Computational Neuroscience 2025 (CCN)

Via

Access Paper or Ask Questions

Optimizing fMRI Data Acquisition for Decoding Natural Speech with Limited Participants

May 27, 2025

Louis Jalouzot, Alexis Thual, Yair Lakretz, Christophe Pallier, Bertrand Thirion

Abstract:We investigate optimal strategies for decoding perceived natural speech from fMRI data acquired from a limited number of participants. Leveraging Lebel et al. (2023)'s dataset of 8 participants, we first demonstrate the effectiveness of training deep neural networks to predict LLM-derived text representations from fMRI activity. Then, in this data regime, we observe that multi-subject training does not improve decoding accuracy compared to single-subject approach. Furthermore, training on similar or different stimuli across subjects has a negligible effect on decoding accuracy. Finally, we find that our decoders better model syntactic than semantic features, and that stories containing sentences with complex syntax or rich semantic content are more challenging to decode. While our results demonstrate the benefits of having extensive data per participant (deep phenotyping), they suggest that leveraging multi-subject for natural speech decoding likely requires deeper phenotyping or a substantially larger cohort.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions

Meta-Learning Neural Mechanisms rather than Bayesian Priors

Mar 20, 2025

Michael Goodale, Salvador Mascarenhas, Yair Lakretz

Abstract:Children acquire language despite being exposed to several orders of magnitude less data than large language models require. Meta-learning has been proposed as a way to integrate human-like learning biases into neural-network architectures, combining both the structured generalizations of symbolic models with the scalability of neural-network models. But what does meta-learning exactly imbue the model with? We investigate the meta-learning of formal languages and find that, contrary to previous claims, meta-trained models are not learning simplicity-based priors when meta-trained on datasets organised around simplicity. Rather, we find evidence that meta-training imprints neural mechanisms (such as counters) into the model, which function like cognitive primitives for the network on downstream tasks. Most surprisingly, we find that meta-training on a single formal language can provide as much improvement to a model as meta-training on 5000 different formal languages, provided that the formal language incentivizes the learning of useful neural mechanisms. Taken together, our findings provide practical implications for efficient meta-learning paradigms and new theoretical insights into linking symbolic theories and neural mechanisms.

Via

Access Paper or Ask Questions

Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Dec 11, 2024

Bruno Bianchi, Aakash Agrawal, Stanislas Dehaene, Emmanuel Chemla, Yair Lakretz

Figure 1 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Figure 2 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Figure 3 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Figure 4 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Abstract:Human readers can accurately count how many letters are in a word (e.g., 7 in ``buffalo''), remove a letter from a given position (e.g., ``bufflo'') or add a new one. The human brain of readers must have therefore learned to disentangle information related to the position of a letter and its identity. Such disentanglement is necessary for the compositional, unbounded, ability of humans to create and parse new strings, with any combination of letters appearing in any positions. Do modern deep neural models also possess this crucial compositional ability? Here, we tested whether neural models that achieve state-of-the-art on disentanglement of features in visual input can also disentangle letter position and letter identity when trained on images of written words. Specifically, we trained beta variational autoencoder ($\beta$-VAE) to reconstruct images of letter strings and evaluated their disentanglement performance using CompOrth - a new benchmark that we created for studying compositional learning and zero-shot generalization in visual models for orthography. The benchmark suggests a set of tests, of increasing complexity, to evaluate the degree of disentanglement between orthographic features of written words in deep neural models. Using CompOrth, we conducted a set of experiments to analyze the generalization ability of these models, in particular, to unseen word length and to unseen combinations of letter identities and letter positions. We found that while models effectively disentangle surface features, such as horizontal and vertical `retinal' locations of words within an image, they dramatically fail to disentangle letter position and letter identity and lack any notion of word length. Together, this study demonstrates the shortcomings of state-of-the-art $\beta$-VAE models compared to humans and proposes a new challenge and a corresponding benchmark to evaluate neural models.

Via

Access Paper or Ask Questions

A polar coordinate system represents syntax in large language models

Dec 07, 2024

Pablo Diego-Simón, Stéphane D'Ascoli, Emmanuel Chemla, Yair Lakretz, Jean-Rémi King

Figure 1 for A polar coordinate system represents syntax in large language models

Figure 2 for A polar coordinate system represents syntax in large language models

Figure 3 for A polar coordinate system represents syntax in large language models

Figure 4 for A polar coordinate system represents syntax in large language models

Abstract:Originally formalized with symbolic representations, syntactic trees may also be effectively represented in the activations of large language models (LLMs). Indeed, a 'Structural Probe' can find a subspace of neural activations, where syntactically related words are relatively close to one-another. However, this syntactic code remains incomplete: the distance between the Structural Probe word embeddings can represent the existence but not the type and direction of syntactic relations. Here, we hypothesize that syntactic relations are, in fact, coded by the relative direction between nearby embeddings. To test this hypothesis, we introduce a 'Polar Probe' trained to read syntactic relations from both the distance and the direction between word embeddings. Our approach reveals three main findings. First, our Polar Probe successfully recovers the type and direction of syntactic relations, and substantially outperforms the Structural Probe by nearly two folds. Second, we confirm that this polar coordinate system exists in a low-dimensional subspace of the intermediate layers of many LLMs and becomes increasingly precise in the latest frontier models. Third, we demonstrate with a new benchmark that similar syntactic relations are coded similarly across the nested levels of syntactic trees. Overall, this work shows that LLMs spontaneously learn a geometry of neural activations that explicitly represents the main symbolic structures of linguistic theory.

* NeurIPS 2024

Via

Access Paper or Ask Questions

What Makes Two Language Models Think Alike?

Jun 24, 2024

Jeanne Salle, Louis Jalouzot, Nur Lan, Emmanuel Chemla, Yair Lakretz

Figure 1 for What Makes Two Language Models Think Alike?

Figure 2 for What Makes Two Language Models Think Alike?

Figure 3 for What Makes Two Language Models Think Alike?

Figure 4 for What Makes Two Language Models Think Alike?

Abstract:Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous methods, MLEMs offer a transparent comparison, by identifying the specific linguistic features responsible for similarities and differences. More generally, the method uses formal, symbolic descriptions of a domain, and use these to compare neural representations. As such, the approach can straightforwardly be extended to other domains, such as speech and vision, and to other neural systems, including human brains.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

What makes two models think alike?

Jun 18, 2024

Jeanne Salle, Louis Jalouzot, Nur Lan, Emmanuel Chemla, Yair Lakretz

Figure 1 for What makes two models think alike?

Figure 2 for What makes two models think alike?

Figure 3 for What makes two models think alike?

Figure 4 for What makes two models think alike?

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Feb 18, 2024

Louis Jalouzot, Robin Sobczyk, Bastien Lhopitallier, Jeanne Salle, Nur Lan, Emmanuel Chemla, Yair Lakretz

Figure 1 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Figure 2 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Figure 3 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Figure 4 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Abstract:We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features are ordered: they separate representations of sentences to different degrees in different layers; (2) neural representations are organized hierarchically: in some layers, we find clusters of representations nested within larger clusters, following successively important linguistic features; (3) linguistic features are disentangled in middle layers: distinct, selective units are activated by distinct linguistic features. Methodologically, MLEMs are superior (4) to multivariate decoding methods, being more robust to type-I errors, and (5) to univariate encoding methods, in being able to predict both local and distributed representations. Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g. vision) and to other neural systems, such as the human brain.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions

Language acquisition: do children and language models follow similar learning stages?

Jun 06, 2023

Linnea Evanson, Yair Lakretz, Jean-Rémi King

Figure 1 for Language acquisition: do children and language models follow similar learning stages?

Figure 2 for Language acquisition: do children and language models follow similar learning stages?

Figure 3 for Language acquisition: do children and language models follow similar learning stages?

Figure 4 for Language acquisition: do children and language models follow similar learning stages?

Abstract:During language acquisition, children follow a typical sequence of learning stages, whereby they first learn to categorize phonemes before they develop their lexicon and eventually master increasingly complex syntactic structures. However, the computational principles that lead to this learning trajectory remain largely unknown. To investigate this, we here compare the learning trajectories of deep language models to those of children. Specifically, we test whether, during its training, GPT-2 exhibits stages of language acquisition comparable to those observed in children aged between 18 months and 6 years. For this, we train 48 GPT-2 models from scratch and evaluate their syntactic and semantic abilities at each training step, using 96 probes curated from the BLiMP, Zorro and BIG-Bench benchmarks. We then compare these evaluations with the behavior of 54 children during language production. Our analyses reveal three main findings. First, similarly to children, the language models tend to learn linguistic skills in a systematic order. Second, this learning scheme is parallel: the language tasks that are learned last improve from the very first training steps. Third, some - but not all - learning stages are shared between children and these language models. Overall, these results shed new light on the principles of language acquisition, and highlight important divergences in how humans and modern algorithms learn to process natural language.

* Accepted to ACL 2023. *Equal Contribution

Via

Access Paper or Ask Questions

Probing Brain Context-Sensitivity with Masked-Attention Generation

May 23, 2023

Alexandre Pasquiou, Yair Lakretz, Bertrand Thirion, Christophe Pallier

Abstract:Two fundamental questions in neurolinguistics concerns the brain regions that integrate information beyond the lexical level, and the size of their window of integration. To address these questions we introduce a new approach named masked-attention generation. It uses GPT-2 transformers to generate word embeddings that capture a fixed amount of contextual information. We then tested whether these embeddings could predict fMRI brain activity in humans listening to naturalistic text. The results showed that most of the cortex within the language network is sensitive to contextual information, and that the right hemisphere is more sensitive to longer contexts than the left. Masked-attention generation supports previous analyses of context-sensitivity in the brain, and complements them by quantifying the window size of context integration per voxel.

* CCN 2023
* 2 pages, 2 figures, CCN 2023

Via

Access Paper or Ask Questions