Picture for Jan Leike

Jan Leike

Tony

Hidden Incentives for Auto-Induced Distributional Shift

Add code
Sep 19, 2020
Figure 1 for Hidden Incentives for Auto-Induced Distributional Shift
Figure 2 for Hidden Incentives for Auto-Induced Distributional Shift
Figure 3 for Hidden Incentives for Auto-Induced Distributional Shift
Figure 4 for Hidden Incentives for Auto-Induced Distributional Shift
Viaarxiv icon

Quantifying Differences in Reward Functions

Add code
Jun 24, 2020
Figure 1 for Quantifying Differences in Reward Functions
Figure 2 for Quantifying Differences in Reward Functions
Figure 3 for Quantifying Differences in Reward Functions
Figure 4 for Quantifying Differences in Reward Functions
Viaarxiv icon

Pitfalls of learning a reward function online

Add code
Apr 28, 2020
Figure 1 for Pitfalls of learning a reward function online
Figure 2 for Pitfalls of learning a reward function online
Figure 3 for Pitfalls of learning a reward function online
Figure 4 for Pitfalls of learning a reward function online
Viaarxiv icon

Learning Human Objectives by Evaluating Hypothetical Behavior

Add code
Dec 05, 2019
Figure 1 for Learning Human Objectives by Evaluating Hypothetical Behavior
Figure 2 for Learning Human Objectives by Evaluating Hypothetical Behavior
Figure 3 for Learning Human Objectives by Evaluating Hypothetical Behavior
Figure 4 for Learning Human Objectives by Evaluating Hypothetical Behavior
Viaarxiv icon

Scaling shared model governance via model splitting

Add code
Dec 14, 2018
Figure 1 for Scaling shared model governance via model splitting
Figure 2 for Scaling shared model governance via model splitting
Figure 3 for Scaling shared model governance via model splitting
Figure 4 for Scaling shared model governance via model splitting
Viaarxiv icon

Scalable agent alignment via reward modeling: a research direction

Add code
Nov 19, 2018
Figure 1 for Scalable agent alignment via reward modeling: a research direction
Figure 2 for Scalable agent alignment via reward modeling: a research direction
Figure 3 for Scalable agent alignment via reward modeling: a research direction
Figure 4 for Scalable agent alignment via reward modeling: a research direction
Viaarxiv icon

Reward learning from human preferences and demonstrations in Atari

Add code
Nov 15, 2018
Figure 1 for Reward learning from human preferences and demonstrations in Atari
Figure 2 for Reward learning from human preferences and demonstrations in Atari
Figure 3 for Reward learning from human preferences and demonstrations in Atari
Figure 4 for Reward learning from human preferences and demonstrations in Atari
Viaarxiv icon

Learning to Understand Goal Specifications by Modelling Reward

Add code
Oct 02, 2018
Figure 1 for Learning to Understand Goal Specifications by Modelling Reward
Figure 2 for Learning to Understand Goal Specifications by Modelling Reward
Figure 3 for Learning to Understand Goal Specifications by Modelling Reward
Figure 4 for Learning to Understand Goal Specifications by Modelling Reward
Viaarxiv icon

AI Safety Gridworlds

Add code
Nov 28, 2017
Figure 1 for AI Safety Gridworlds
Figure 2 for AI Safety Gridworlds
Figure 3 for AI Safety Gridworlds
Figure 4 for AI Safety Gridworlds
Viaarxiv icon

Deep reinforcement learning from human preferences

Add code
Jul 13, 2017
Figure 1 for Deep reinforcement learning from human preferences
Figure 2 for Deep reinforcement learning from human preferences
Figure 3 for Deep reinforcement learning from human preferences
Figure 4 for Deep reinforcement learning from human preferences
Viaarxiv icon