Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Everitt

DeepMind

Discovering Agents

Aug 24, 2022

Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

Abstract:Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.

* Some typos corrected

Via

Access Paper or Ask Questions

Path-Specific Objectives for Safer Agent Incentives

Apr 21, 2022

Sebastian Farquhar, Ryan Carey, Tom Everitt

Figure 1 for Path-Specific Objectives for Safer Agent Incentives

Figure 2 for Path-Specific Objectives for Safer Agent Incentives

Figure 3 for Path-Specific Objectives for Safer Agent Incentives

Figure 4 for Path-Specific Objectives for Safer Agent Incentives

Abstract:We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.

* Presented at AAAI 2022

Via

Access Paper or Ask Questions

A Complete Criterion for Value of Information in Soluble Influence Diagrams

Feb 23, 2022

Chris van Merwijk, Ryan Carey, Tom Everitt

Figure 1 for A Complete Criterion for Value of Information in Soluble Influence Diagrams

Figure 2 for A Complete Criterion for Value of Information in Soluble Influence Diagrams

Figure 3 for A Complete Criterion for Value of Information in Soluble Influence Diagrams

Figure 4 for A Complete Criterion for Value of Information in Soluble Influence Diagrams

Abstract:Influence diagrams have recently been used to analyse the safety and fairness properties of AI systems. A key building block for this analysis is a graphical criterion for value of information (VoI). This paper establishes the first complete graphical criterion for VoI in influence diagrams with multiple decisions. Along the way, we establish two important techniques for proving properties of multi-decision influence diagrams: ID homomorphisms are structure-preserving transformations of influence diagrams, while a Tree of Systems is collection of paths that captures how information and control can flow in an influence diagram.

* In Proceedings of the AAAI 2022 Conference

Via

Access Paper or Ask Questions

Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

Feb 23, 2022

Carolyn Ashurst, Ryan Carey, Silvia Chiappa, Tom Everitt

Figure 1 for Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

Figure 2 for Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

Figure 3 for Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

Figure 4 for Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

Abstract:In addition to reproducing discriminatory relationships in the training data, machine learning systems can also introduce or amplify discriminatory effects. We refer to this as introduced unfairness, and investigate the conditions under which it may arise. To this end, we propose introduced total variation as a measure of introduced unfairness, and establish graphical conditions under which it may be incentivised to occur. These criteria imply that adding the sensitive attribute as a feature removes the incentive for introduced variation under well-behaved loss functions. Additionally, taking a causal perspective, introduced path-specific effects shed light on the issue of when specific paths should be considered fair.

* In Proceedings of the AAAI 2022 Conference

Via

Access Paper or Ask Questions

Shaking the foundations: delusions in sequence models for interaction and control

Oct 20, 2021

Pedro A. Ortega, Markus Kunesch, Grégoire Delétang, Tim Genewein, Jordi Grau-Moya, Joel Veness, Jonas Buchli, Jonas Degrave, Bilal Piot, Julien Perolat(+9 more)

Figure 1 for Shaking the foundations: delusions in sequence models for interaction and control

Figure 2 for Shaking the foundations: delusions in sequence models for interaction and control

Figure 3 for Shaking the foundations: delusions in sequence models for interaction and control

Figure 4 for Shaking the foundations: delusions in sequence models for interaction and control

Abstract:The recent phenomenal success of language models has reinvigorated machine learning research, and large sequence models such as transformers are being applied to a variety of domains. One important problem class that has remained relatively elusive however is purposeful adaptive behavior. Currently there is a common perception that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. In this report we explain where this mismatch originates, and show that it can be resolved by treating actions as causal interventions. Finally, we show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.

* DeepMind Tech Report, 16 pages, 4 figures

Via

Access Paper or Ask Questions

Alignment of Language Agents

Mar 26, 2021

Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving

Abstract:For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including deceptive or manipulative language, and review some approaches for avoiding these issues.

Via

Access Paper or Ask Questions

How RL Agents Behave When Their Actions Are Modified

Feb 15, 2021

Eric D. Langlois, Tom Everitt

Figure 1 for How RL Agents Behave When Their Actions Are Modified

Figure 2 for How RL Agents Behave When Their Actions Are Modified

Figure 3 for How RL Agents Behave When Their Actions Are Modified

Figure 4 for How RL Agents Behave When Their Actions Are Modified

Abstract:Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result of supervisor intervention, the executed action may differ from the action specified by the policy. How does this affect learning? We present the Modified-Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviours of common reinforcement learning algorithms in this setting and show that they adapt in different ways: some completely ignore modifications while others go to various lengths in trying to avoid action modifications that decrease reward. By choosing the right algorithm, developers can prevent their agents from learning to circumvent interruptions or constraints, and better control agent responses to other kinds of action modification, like self-damage.

* 10 pages (+6 appendix); 5 figures. Published in the AAAI 2021 Conference. Code is available at https://github.com/edlanglois/mamdp

Via

Access Paper or Ask Questions

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Feb 09, 2021

Lewis Hammond, James Fox, Tom Everitt, Alessandro Abate, Michael Wooldridge

Figure 1 for Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Figure 2 for Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Figure 3 for Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Figure 4 for Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Abstract:Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.

* Accepted to the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS-21)

Via

Access Paper or Ask Questions

Agent Incentives: A Causal Perspective

Feb 02, 2021

Tom Everitt, Ryan Carey, Eric Langlois, Pedro A Ortega, Shane Legg

Figure 1 for Agent Incentives: A Causal Perspective

Figure 2 for Agent Incentives: A Causal Perspective

Figure 3 for Agent Incentives: A Causal Perspective

Figure 4 for Agent Incentives: A Causal Perspective

Abstract:We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.

* In Proceedings of the AAAI 2021 Conference. Supersedes arXiv:1902.09980, arXiv:2001.07118

Via

Access Paper or Ask Questions

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Nov 17, 2020

Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

Figure 1 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Figure 2 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Figure 3 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Figure 4 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Abstract:How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent? Standard RL algorithms assume a secure reward function, and can thus perform poorly in settings where agents can tamper with the reward-generating mechanism. We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoupled feedback collection procedure. For a natural class of corruption functions, decoupled approval algorithms have aligned incentives both at convergence and for their local updates. Empirically, they also scale to complex 3D environments where tampering is possible.

Via

Access Paper or Ask Questions