Abstract:Transformers pretrained via next token prediction learn to factor their world into parts, representing these factors in orthogonal subspaces of the residual stream. We formalize two representational hypotheses: (1) a representation in the product space of all factors, whose dimension grows exponentially with the number of parts, or (2) a factored representation in orthogonal subspaces, whose dimension grows linearly. The factored representation is lossless when factors are conditionally independent, but sacrifices predictive fidelity otherwise, creating a tradeoff between dimensional efficiency and accuracy. We derive precise predictions about the geometric structure of activations for each, including the number of subspaces, their dimensionality, and the arrangement of context embeddings within them. We test between these hypotheses on transformers trained on synthetic processes with known latent structure. Models learn factored representations when factors are conditionally independent, and continue to favor them early in training even when noise or hidden dependencies undermine conditional independence, reflecting an inductive bias toward factoring at the cost of fidelity. This provides a principled explanation for why transformers decompose the world into parts, and suggests that interpretable low dimensional structure may persist even in models trained on complex data.
Abstract:Markedly increased computational power and data acquisition have led to growing interest in data-driven inverse dynamics problems. These seek to answer a fundamental question: What can we learn from time series measurements of a complex dynamical system? For small systems interacting with external environments, the effective dynamics are inherently stochastic, making it crucial to properly manage noise in data. Here, we explore this for systems obeying Langevin dynamics and, using currents, we construct a learning framework for stochastic modeling. Currents have recently gained increased attention for their role in bounding entropy production (EP) from thermodynamic uncertainty relations (TURs). We introduce a fundamental relationship between the cumulant currents there and standard machine-learning loss functions. Using this, we derive loss functions for several key thermodynamic functions directly from the system dynamics without the (common) intermediate step of deriving a TUR. These loss functions reproduce results derived both from TURs and other methods. More significantly, they open a path to discover new loss functions for previously inaccessible quantities. Notably, this includes access to per-trajectory entropy production, even if the observed system is driven far from its steady-state. We also consider higher order estimation. Our method is straightforward and unifies dynamic inference with recent approaches to entropy production estimation. Taken altogether, this reveals a deep connection between diffusion models in machine learning and entropy production estimation in stochastic thermodynamics.