Picture for Xander Davies

Xander Davies

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

Add code
Sep 12, 2023
Figure 1 for Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Figure 2 for Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Figure 3 for Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Figure 4 for Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Discovering Variable Binding Circuitry with Desiderata

Add code
Jul 07, 2023
Figure 1 for Discovering Variable Binding Circuitry with Desiderata
Figure 2 for Discovering Variable Binding Circuitry with Desiderata
Figure 3 for Discovering Variable Binding Circuitry with Desiderata
Figure 4 for Discovering Variable Binding Circuitry with Desiderata
Viaarxiv icon

Sparse Distributed Memory is a Continual Learner

Add code
Mar 20, 2023
Figure 1 for Sparse Distributed Memory is a Continual Learner
Figure 2 for Sparse Distributed Memory is a Continual Learner
Figure 3 for Sparse Distributed Memory is a Continual Learner
Figure 4 for Sparse Distributed Memory is a Continual Learner
Viaarxiv icon

Unifying Grokking and Double Descent

Add code
Mar 10, 2023
Figure 1 for Unifying Grokking and Double Descent
Figure 2 for Unifying Grokking and Double Descent
Figure 3 for Unifying Grokking and Double Descent
Figure 4 for Unifying Grokking and Double Descent
Viaarxiv icon