Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christopher J. Earls

Jacobian Scopes: token-level causal attributions in LLMs

Jan 23, 2026

Toni J. B. Liu, Baran Zadeoğlu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls

Abstract:Large language models (LLMs) make next-token predictions based on clues present in their context, such as semantic descriptions and in-context examples. Yet, elucidating which prior tokens most strongly influence a given prediction remains challenging due to the proliferation of layers and attention heads in modern architectures. We propose Jacobian Scopes, a suite of gradient-based, token-level causal attribution methods for interpreting LLM predictions. By analyzing the linearized relations of final hidden state with respect to inputs, Jacobian Scopes quantify how input tokens influence a model's prediction. We introduce three variants - Semantic, Fisher, and Temperature Scopes - which respectively target sensitivity of specific logits, the full predictive distribution, and model confidence (inverse temperature). Through case studies spanning instruction understanding, translation and in-context learning (ICL), we uncover interesting findings, such as when Jacobian Scopes point to implicit political biases. We believe that our proposed methods also shed light on recently debated mechanisms underlying in-context time-series forecasting. Our code and interactive demonstrations are publicly available at https://github.com/AntonioLiu97/JacobianScopes.

* 12 pages, 15 figures, under review at ACL 2026

Via

Access Paper or Ask Questions

Density estimation with LLMs: a geometric investigation of in-context learning trajectories

Oct 07, 2024

Toni J. B. Liu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls

Abstract:Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. Our main finding is that these LLMs all follow similar learning trajectories in a low-dimensional InPCA space, which are distinct from those of traditional density estimation methods like histograms and Gaussian kernel density estimation (KDE). We interpret the LLaMA in-context DE process as a KDE with an adaptive kernel width and shape. This custom kernel model captures a significant portion of LLaMA's behavior despite having only two parameters. We further speculate on why LLaMA's kernel width and shape differs from classical algorithms, providing insights into the mechanism of in-context probabilistic reasoning in LLMs.

* Under review as a conference paper at ICLR 2025

Via

Access Paper or Ask Questions

Lines of Thought in Large Language Models

Oct 02, 2024

Raphaël Sarfati, Toni J. B. Liu, Nicolas Boullé, Christopher J. Earls

Figure 1 for Lines of Thought in Large Language Models

Figure 2 for Lines of Thought in Large Language Models

Figure 3 for Lines of Thought in Large Language Models

Figure 4 for Lines of Thought in Large Language Models

Abstract:Large Language Models achieve next-token prediction by transporting a vectorized piece of text (prompt) across an accompanying embedding space under the action of successive transformer layers. The resulting high-dimensional trajectories realize different contextualization, or 'thinking', steps, and fully determine the output probability distribution. We aim to characterize the statistical properties of ensembles of these 'lines of thought.' We observe that independent trajectories cluster along a low-dimensional, non-Euclidean manifold, and that their path can be well approximated by a stochastic equation with few parameters extracted from data. We find it remarkable that the vast complexity of such large models can be reduced to a much simpler form, and we reflect on implications.

Via

Access Paper or Ask Questions

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Feb 01, 2024

Toni J. B. Liu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls

Figure 1 for LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Figure 2 for LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Figure 3 for LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Figure 4 for LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Abstract:Pretrained large language models (LLMs) are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. In this paper, we study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our results show that LLaMA 2, a language model trained primarily on texts, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs.

Via

Access Paper or Ask Questions

Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Aug 14, 2021

Christophe Bonneville, Christopher J. Earls

Figure 1 for Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Figure 2 for Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Figure 3 for Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Figure 4 for Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Abstract:Scientific machine learning has been successfully applied to inverse problems and PDE discoveries in computational physics. One caveat of current methods however is the need for large amounts of (clean) data in order to recover full system responses or underlying physical models. Bayesian methods may be particularly promising to overcome these challenges as they are naturally less sensitive to sparse and noisy data. In this paper, we propose to use Bayesian neural networks (BNN) in order to: 1) Recover the full system states from measurement data (e.g. temperature, velocity field, etc.). We use Hamiltonian Monte-Carlo to sample the posterior distribution of a deep and dense BNN, and show that it is possible to accurately capture physics of varying complexity without overfitting. 2) Recover the parameters in the underlying partial differential equation (PDE) governing the physical system. Using the trained BNN as a surrogate of the system response, we generate datasets of derivatives potentially comprising the latent PDE of the observed system and perform a Bayesian linear regression (BLR) between the successive derivatives in space and time to recover the original PDE parameters. We take advantage of the confidence intervals on the BNN outputs and introduce the spatial derivative variance into the BLR likelihood to discard the influence of highly uncertain surrogate data points, which allows for more accurate parameter discovery. We demonstrate our approach on a handful of example applied to physics and non-linear dynamics.

Via

Access Paper or Ask Questions

Data-driven discovery of physical laws with human-understandable deep learning

May 01, 2021

Nicolas Boullé, Christopher J. Earls, Alex Townsend

Figure 1 for Data-driven discovery of physical laws with human-understandable deep learning

Figure 2 for Data-driven discovery of physical laws with human-understandable deep learning

Figure 3 for Data-driven discovery of physical laws with human-understandable deep learning

Figure 4 for Data-driven discovery of physical laws with human-understandable deep learning

Abstract:There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. We develop a novel data-driven approach for creating a human-machine partnership to accelerate scientific discovery. By collecting physical system responses, under carefully selected excitations, we train rational neural networks to learn Green's functions of hidden partial differential equation. These solutions reveal human-understandable properties and features, such as linear conservation laws, and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate this technique on several examples and capture a range of physics, including advection-diffusion, viscous shocks, and Stokes flow in a lid-driven cavity.

* 52 pages, 22 figures

Via

Access Paper or Ask Questions