Picture for Alexander Matt Turner

Alexander Matt Turner

Distillation Robustifies Unlearning

Add code
Jun 06, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

Add code
Oct 06, 2024
Viaarxiv icon

Steering Llama 2 via Contrastive Activation Addition

Add code
Dec 09, 2023
Figure 1 for Steering Llama 2 via Contrastive Activation Addition
Figure 2 for Steering Llama 2 via Contrastive Activation Addition
Figure 3 for Steering Llama 2 via Contrastive Activation Addition
Figure 4 for Steering Llama 2 via Contrastive Activation Addition
Viaarxiv icon

Understanding and Controlling a Maze-Solving Policy Network

Add code
Oct 12, 2023
Viaarxiv icon

Activation Addition: Steering Language Models Without Optimization

Add code
Sep 01, 2023
Viaarxiv icon

Parametrically Retargetable Decision-Makers Tend To Seek Power

Add code
Jun 27, 2022
Figure 1 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Figure 2 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Figure 3 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Figure 4 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Viaarxiv icon

Formalizing the Problem of Side Effect Regularization

Add code
Jun 24, 2022
Figure 1 for Formalizing the Problem of Side Effect Regularization
Figure 2 for Formalizing the Problem of Side Effect Regularization
Viaarxiv icon

On Avoiding Power-Seeking by Artificial Intelligence

Add code
Jun 23, 2022
Viaarxiv icon

Avoiding Side Effects in Complex Environments

Add code
Jun 11, 2020
Figure 1 for Avoiding Side Effects in Complex Environments
Figure 2 for Avoiding Side Effects in Complex Environments
Figure 3 for Avoiding Side Effects in Complex Environments
Figure 4 for Avoiding Side Effects in Complex Environments
Viaarxiv icon