Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning the Preferences of a Learning Agent

May 09, 2026

Karim Abdel Sadek, Mark Bedaywi, Rhys Gould, Stuart Russell

Share this with someone who'll enjoy it:

Abstract:For AI systems to be useful to humans, they must understand and act in accordance with our values and preferences. Since specifying preferences is a hard task, inverse reinforcement learning (IRL) aims to develop methods that allow for inferring preferences from observed behavior. However, IRL assumes the human to be approximately optimal. This is a big limitation in cases where the human themselves may be learning to act optimally in an environment. In this paper, we formalize the problem of learning the preferences of a learning agent: a predictor observes a learner acting online and tries to infer the underlying reward function being (initially suboptimally) optimized by the learner. We model the learner as either being no-regret, or as converging to an optimal Boltzmann policy over time. In each of these settings, we establish theoretical guarantees for various preference learning algorithms, or otherwise show that such guarantees are impossible.

* Published at ICLR 2026, Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI. 9 pages main text

View paper on

Share this with someone who'll enjoy it:

Title:Learning the Preferences of a Learning Agent

Paper and Code