Alert button
Picture for Alexander Matt Turner

Alexander Matt Turner

Alert button

Steering Llama 2 via Contrastive Activation Addition

Add code
Bookmark button
Alert button
Dec 09, 2023
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander Matt Turner

Viaarxiv icon

Understanding and Controlling a Maze-Solving Policy Network

Add code
Bookmark button
Alert button
Oct 12, 2023
Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner

Figure 1 for Understanding and Controlling a Maze-Solving Policy Network
Figure 2 for Understanding and Controlling a Maze-Solving Policy Network
Figure 3 for Understanding and Controlling a Maze-Solving Policy Network
Figure 4 for Understanding and Controlling a Maze-Solving Policy Network
Viaarxiv icon

Activation Addition: Steering Language Models Without Optimization

Add code
Bookmark button
Alert button
Sep 01, 2023
Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid

Viaarxiv icon

Parametrically Retargetable Decision-Makers Tend To Seek Power

Add code
Bookmark button
Alert button
Jun 27, 2022
Alexander Matt Turner, Prasad Tadepalli

Figure 1 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Figure 2 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Figure 3 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Figure 4 for Parametrically Retargetable Decision-Makers Tend To Seek Power
Viaarxiv icon

Formalizing the Problem of Side Effect Regularization

Add code
Bookmark button
Alert button
Jun 24, 2022
Alexander Matt Turner, Aseem Saxena, Prasad Tadepalli

Figure 1 for Formalizing the Problem of Side Effect Regularization
Figure 2 for Formalizing the Problem of Side Effect Regularization
Viaarxiv icon

On Avoiding Power-Seeking by Artificial Intelligence

Add code
Bookmark button
Alert button
Jun 23, 2022
Alexander Matt Turner

Viaarxiv icon

Formalizing the Problem of Side-Effect Avoidance

Add code
Bookmark button
Alert button
Jun 23, 2022
Alexander Matt Turner, Aseem Saxena, Prasad Tadepalli

Figure 1 for Formalizing the Problem of Side-Effect Avoidance
Figure 2 for Formalizing the Problem of Side-Effect Avoidance
Viaarxiv icon

Avoiding Side Effects in Complex Environments

Add code
Bookmark button
Alert button
Jun 11, 2020
Alexander Matt Turner, Neale Ratzlaff, Prasad Tadepalli

Figure 1 for Avoiding Side Effects in Complex Environments
Figure 2 for Avoiding Side Effects in Complex Environments
Figure 3 for Avoiding Side Effects in Complex Environments
Figure 4 for Avoiding Side Effects in Complex Environments
Viaarxiv icon

Optimal Farsighted Agents Tend to Seek Power

Add code
Bookmark button
Alert button
Jan 19, 2020
Alexander Matt Turner

Figure 1 for Optimal Farsighted Agents Tend to Seek Power
Figure 2 for Optimal Farsighted Agents Tend to Seek Power
Figure 3 for Optimal Farsighted Agents Tend to Seek Power
Figure 4 for Optimal Farsighted Agents Tend to Seek Power
Viaarxiv icon