Abstract:Transformers pretrained via next token prediction learn to factor their world into parts, representing these factors in orthogonal subspaces of the residual stream. We formalize two representational hypotheses: (1) a representation in the product space of all factors, whose dimension grows exponentially with the number of parts, or (2) a factored representation in orthogonal subspaces, whose dimension grows linearly. The factored representation is lossless when factors are conditionally independent, but sacrifices predictive fidelity otherwise, creating a tradeoff between dimensional efficiency and accuracy. We derive precise predictions about the geometric structure of activations for each, including the number of subspaces, their dimensionality, and the arrangement of context embeddings within them. We test between these hypotheses on transformers trained on synthetic processes with known latent structure. Models learn factored representations when factors are conditionally independent, and continue to favor them early in training even when noise or hidden dependencies undermine conditional independence, reflecting an inductive bias toward factoring at the cost of fidelity. This provides a principled explanation for why transformers decompose the world into parts, and suggests that interpretable low dimensional structure may persist even in models trained on complex data.




Abstract:Key to biological success, the requisite variety that confronts an adaptive organism is the set of detectable, accessible, and controllable states in its environment. We analyze its role in the thermodynamic functioning of information ratchets---a form of autonomous Maxwellian Demon capable of exploiting fluctuations in an external information reservoir to harvest useful work from a thermal bath. This establishes a quantitative paradigm for understanding how adaptive agents leverage structured thermal environments for their own thermodynamic benefit. General ratchets behave as memoryful communication channels, interacting with their environment sequentially and storing results to an output. The bulk of thermal ratchets analyzed to date, however, assume memoryless environments that generate input signals without temporal correlations. Employing computational mechanics and a new information-processing Second Law of Thermodynamics (IPSL) we remove these restrictions, analyzing general finite-state ratchets interacting with structured environments that generate correlated input signals. On the one hand, we demonstrate that a ratchet need not have memory to exploit an uncorrelated environment. On the other, and more appropriate to biological adaptation, we show that a ratchet must have memory to most effectively leverage structure and correlation in its environment. The lesson is that to optimally harvest work a ratchet's memory must reflect the input generator's memory. Finally, we investigate achieving the IPSL bounds on the amount of work a ratchet can extract from its environment, discovering that finite-state, optimal ratchets are unable to reach these bounds. In contrast, we show that infinite-state ratchets can go well beyond these bounds by utilizing their own infinite "negentropy". We conclude with an outline of the collective thermodynamics of information-ratchet swarms.