Picture for Adam Gleave

Adam Gleave

Scaling Laws for Data Poisoning in LLMs

Add code
Aug 06, 2024
Viaarxiv icon

Exploring Scaling Trends in LLM Robustness

Add code
Jul 26, 2024
Viaarxiv icon

Planning behavior in a recurrent neural network that plays Sokoban

Add code
Jul 22, 2024
Viaarxiv icon

Can Go AIs be adversarially robust?

Add code
Jun 18, 2024
Viaarxiv icon

Uncovering Latent Human Wellbeing in Language Model Embeddings

Add code
Feb 19, 2024
Viaarxiv icon

Exploiting Novel GPT-4 APIs

Add code
Dec 21, 2023
Viaarxiv icon

STARC: A General Framework For Quantifying Differences Between Reward Functions

Add code
Sep 26, 2023
Viaarxiv icon

On The Fragility of Learned Reward Functions

Add code
Jan 09, 2023
Viaarxiv icon

imitation: Clean Imitation Learning Implementations

Add code
Nov 22, 2022
Figure 1 for imitation: Clean Imitation Learning Implementations
Figure 2 for imitation: Clean Imitation Learning Implementations
Figure 3 for imitation: Clean Imitation Learning Implementations
Figure 4 for imitation: Clean Imitation Learning Implementations
Viaarxiv icon

Adversarial Policies Beat Professional-Level Go AIs

Add code
Nov 01, 2022
Viaarxiv icon