Picture for Shane Legg

Shane Legg

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Levels of AGI: Operationalizing Progress on the Path to AGI

Add code
Nov 04, 2023
Figure 1 for Levels of AGI: Operationalizing Progress on the Path to AGI
Figure 2 for Levels of AGI: Operationalizing Progress on the Path to AGI
Viaarxiv icon

The Hydra Effect: Emergent Self-repair in Language Model Computations

Add code
Jul 28, 2023
Viaarxiv icon

Randomized Positional Encodings Boost Length Generalization of Transformers

Add code
May 26, 2023
Viaarxiv icon

Beyond Bayes-optimality: meta-learning what you know you don't know

Add code
Oct 12, 2022
Figure 1 for Beyond Bayes-optimality: meta-learning what you know you don't know
Figure 2 for Beyond Bayes-optimality: meta-learning what you know you don't know
Figure 3 for Beyond Bayes-optimality: meta-learning what you know you don't know
Figure 4 for Beyond Bayes-optimality: meta-learning what you know you don't know
Viaarxiv icon

Neural Networks and the Chomsky Hierarchy

Add code
Jul 05, 2022
Figure 1 for Neural Networks and the Chomsky Hierarchy
Figure 2 for Neural Networks and the Chomsky Hierarchy
Figure 3 for Neural Networks and the Chomsky Hierarchy
Figure 4 for Neural Networks and the Chomsky Hierarchy
Viaarxiv icon

Your Policy Regularizer is Secretly an Adversary

Add code
Apr 01, 2022
Figure 1 for Your Policy Regularizer is Secretly an Adversary
Figure 2 for Your Policy Regularizer is Secretly an Adversary
Figure 3 for Your Policy Regularizer is Secretly an Adversary
Figure 4 for Your Policy Regularizer is Secretly an Adversary
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Jan 21, 2022
Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

Model-Free Risk-Sensitive Reinforcement Learning

Add code
Nov 04, 2021
Figure 1 for Model-Free Risk-Sensitive Reinforcement Learning
Figure 2 for Model-Free Risk-Sensitive Reinforcement Learning
Figure 3 for Model-Free Risk-Sensitive Reinforcement Learning
Figure 4 for Model-Free Risk-Sensitive Reinforcement Learning
Viaarxiv icon