Abstract:Large language models (LLMs) have been shown to acquire sequence-level planning abilities during training, yet their planning behavior exhibited at inference time often appears short-sighted and inconsistent with these capabilities. We propose a Bayesian account for this gap by grounding planning behavior in the evolving generative context: given the subtle differences between natural language and the language internalized by LLMs, accumulated self-generated context drives a planning-shift during inference and thereby creates the appearance of compromised planning behavior. We further validate the proposed model through two controlled experiments: a random-generation task demonstrating constrained planning under human prompts and increasing planning strength as self-generated context accumulates, and a Gaussian-sampling task showing reduced initial bias when conditioning on self-generated sequences. These findings provide a theoretical explanation along with empirical evidence for characterizing how LLMs plan ahead during inference.
Abstract:Incorporating individual-level cognitive priors offers an important route to personalizing neural networks, yet accurately eliciting such priors remains challenging: existing methods either fail to uniquely identify them or introduce systematic biases. Here, we introduce PriorProbe, a novel elicitation approach grounded in Markov Chain Monte Carlo with People that recovers fine-grained, individual-specific priors. Focusing on a facial expression recognition task, we apply PriorProbe to individual participants and test whether integrating the recovered priors with a state-of-the-art neural network improves its ability to predict an individual's classification on ambiguous stimuli. The PriorProbe-derived priors yield substantial performance gains, outperforming both the neural network alone and alternative sources of priors, while preserving the network's inference on ground-truth labels. Together, these results demonstrate that PriorProbe provides a general and interpretable framework for personalizing deep neural networks.