Picture for Adam Gleave

Adam Gleave

Can Go AIs be adversarially robust?

Add code
Jun 18, 2024
Viaarxiv icon

Uncovering Latent Human Wellbeing in Language Model Embeddings

Add code
Feb 19, 2024
Viaarxiv icon

Exploiting Novel GPT-4 APIs

Add code
Dec 21, 2023
Viaarxiv icon

STARC: A General Framework For Quantifying Differences Between Reward Functions

Add code
Sep 26, 2023
Figure 1 for STARC: A General Framework For Quantifying Differences Between Reward Functions
Viaarxiv icon

On The Fragility of Learned Reward Functions

Add code
Jan 09, 2023
Figure 1 for On The Fragility of Learned Reward Functions
Figure 2 for On The Fragility of Learned Reward Functions
Figure 3 for On The Fragility of Learned Reward Functions
Figure 4 for On The Fragility of Learned Reward Functions
Viaarxiv icon

imitation: Clean Imitation Learning Implementations

Add code
Nov 22, 2022
Figure 1 for imitation: Clean Imitation Learning Implementations
Figure 2 for imitation: Clean Imitation Learning Implementations
Figure 3 for imitation: Clean Imitation Learning Implementations
Figure 4 for imitation: Clean Imitation Learning Implementations
Viaarxiv icon

Adversarial Policies Beat Professional-Level Go AIs

Add code
Nov 01, 2022
Figure 1 for Adversarial Policies Beat Professional-Level Go AIs
Figure 2 for Adversarial Policies Beat Professional-Level Go AIs
Figure 3 for Adversarial Policies Beat Professional-Level Go AIs
Figure 4 for Adversarial Policies Beat Professional-Level Go AIs
Viaarxiv icon

Calculus on MDPs: Potential Shaping as a Gradient

Add code
Aug 20, 2022
Figure 1 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 2 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 3 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 4 for Calculus on MDPs: Potential Shaping as a Gradient
Viaarxiv icon

Reducing Exploitability with Population Based Training

Add code
Aug 10, 2022
Figure 1 for Reducing Exploitability with Population Based Training
Figure 2 for Reducing Exploitability with Population Based Training
Figure 3 for Reducing Exploitability with Population Based Training
Figure 4 for Reducing Exploitability with Population Based Training
Viaarxiv icon

Preprocessing Reward Functions for Interpretability

Add code
Mar 25, 2022
Figure 1 for Preprocessing Reward Functions for Interpretability
Figure 2 for Preprocessing Reward Functions for Interpretability
Figure 3 for Preprocessing Reward Functions for Interpretability
Figure 4 for Preprocessing Reward Functions for Interpretability
Viaarxiv icon