Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Frank Yingjie Huo

Jekyll-and-Hyde Tipping Point in an AI's Behavior

Apr 29, 2025

Neil F. Johnson, Frank Yingjie Huo

Abstract:Trust in AI is undermined by the fact that there is no science that predicts -- or that can explain to the public -- when an LLM's output (e.g. ChatGPT) is likely to tip mid-response to become wrong, misleading, irrelevant or dangerous. With deaths and trauma already being blamed on LLMs, this uncertainty is even pushing people to treat their 'pet' LLM more politely to 'dissuade' it (or its future Artificial General Intelligence offspring) from suddenly turning on them. Here we address this acute need by deriving from first principles an exact formula for when a Jekyll-and-Hyde tipping point occurs at LLMs' most basic level. Requiring only secondary school mathematics, it shows the cause to be the AI's attention spreading so thin it suddenly snaps. This exact formula provides quantitative predictions for how the tipping-point can be delayed or prevented by changing the prompt and the AI's training. Tailored generalizations will provide policymakers and the public with a firm platform for discussing any of AI's broader uses and risks, e.g. as a personal counselor, medical advisor, decision-maker for when to use force in a conflict situation. It also meets the need for clear and transparent answers to questions like ''should I be polite to my LLM?''

Via

Access Paper or Ask Questions

Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond

Apr 06, 2025

Frank Yingjie Huo, Neil F. Johnson

Figure 1 for Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond

Figure 2 for Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond

Figure 3 for Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond

Figure 4 for Capturing AI's Attention: Physics of Repetition, Hallucination, Bias and Beyond

Abstract:We derive a first-principles physics theory of the AI engine at the heart of LLMs' 'magic' (e.g. ChatGPT, Claude): the basic Attention head. The theory allows a quantitative analysis of outstanding AI challenges such as output repetition, hallucination and harmful content, and bias (e.g. from training and fine-tuning). Its predictions are consistent with large-scale LLM outputs. Its 2-body form suggests why LLMs work so well, but hints that a generalized 3-body Attention would make such AI work even better. Its similarity to a spin-bath means that existing Physics expertise could immediately be harnessed to help Society ensure AI is trustworthy and resilient to manipulation.

* Comments welcome to neiljohnson@gwu.edu

Via

Access Paper or Ask Questions