Picture for Tom Everitt

Tom Everitt

DeepMind

REALab: An Embedded Perspective on Tampering

Add code
Nov 17, 2020
Figure 1 for REALab: An Embedded Perspective on Tampering
Figure 2 for REALab: An Embedded Perspective on Tampering
Figure 3 for REALab: An Embedded Perspective on Tampering
Figure 4 for REALab: An Embedded Perspective on Tampering
Viaarxiv icon

The Incentives that Shape Behaviour

Add code
Jan 20, 2020
Figure 1 for The Incentives that Shape Behaviour
Figure 2 for The Incentives that Shape Behaviour
Figure 3 for The Incentives that Shape Behaviour
Figure 4 for The Incentives that Shape Behaviour
Viaarxiv icon

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

Add code
Aug 20, 2019
Figure 1 for Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Figure 2 for Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Figure 3 for Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Figure 4 for Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Viaarxiv icon

Modeling AGI Safety Frameworks with Causal Influence Diagrams

Add code
Jun 20, 2019
Figure 1 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Figure 2 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Figure 3 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Figure 4 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Viaarxiv icon

Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings

Add code
Mar 12, 2019
Figure 1 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Figure 2 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Figure 3 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Figure 4 for Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings
Viaarxiv icon

Scalable agent alignment via reward modeling: a research direction

Add code
Nov 19, 2018
Figure 1 for Scalable agent alignment via reward modeling: a research direction
Figure 2 for Scalable agent alignment via reward modeling: a research direction
Figure 3 for Scalable agent alignment via reward modeling: a research direction
Figure 4 for Scalable agent alignment via reward modeling: a research direction
Viaarxiv icon

AGI Safety Literature Review

Add code
May 21, 2018
Figure 1 for AGI Safety Literature Review
Viaarxiv icon

A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

Add code
Apr 12, 2018
Figure 1 for A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem
Figure 2 for A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem
Figure 3 for A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem
Figure 4 for A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem
Viaarxiv icon

AI Safety Gridworlds

Add code
Nov 28, 2017
Figure 1 for AI Safety Gridworlds
Figure 2 for AI Safety Gridworlds
Figure 3 for AI Safety Gridworlds
Figure 4 for AI Safety Gridworlds
Viaarxiv icon

Reinforcement Learning with a Corrupted Reward Channel

Add code
Aug 19, 2017
Figure 1 for Reinforcement Learning with a Corrupted Reward Channel
Figure 2 for Reinforcement Learning with a Corrupted Reward Channel
Figure 3 for Reinforcement Learning with a Corrupted Reward Channel
Figure 4 for Reinforcement Learning with a Corrupted Reward Channel
Viaarxiv icon