Alert button
Picture for Xander Davies

Xander Davies

Alert button

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

Add code
Bookmark button
Alert button
Sep 12, 2023
Maximilian Li, Xander Davies, Max Nadeau

Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Bookmark button
Alert button
Jul 27, 2023
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Discovering Variable Binding Circuitry with Desiderata

Add code
Bookmark button
Alert button
Jul 07, 2023
Xander Davies, Max Nadeau, Nikhil Prakash, Tamar Rott Shaham, David Bau

Figure 1 for Discovering Variable Binding Circuitry with Desiderata
Figure 2 for Discovering Variable Binding Circuitry with Desiderata
Figure 3 for Discovering Variable Binding Circuitry with Desiderata
Figure 4 for Discovering Variable Binding Circuitry with Desiderata
Viaarxiv icon

Sparse Distributed Memory is a Continual Learner

Add code
Bookmark button
Alert button
Mar 20, 2023
Trenton Bricken, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel Kreiman

Figure 1 for Sparse Distributed Memory is a Continual Learner
Figure 2 for Sparse Distributed Memory is a Continual Learner
Figure 3 for Sparse Distributed Memory is a Continual Learner
Figure 4 for Sparse Distributed Memory is a Continual Learner
Viaarxiv icon

Unifying Grokking and Double Descent

Add code
Bookmark button
Alert button
Mar 10, 2023
Xander Davies, Lauro Langosco, David Krueger

Figure 1 for Unifying Grokking and Double Descent
Figure 2 for Unifying Grokking and Double Descent
Figure 3 for Unifying Grokking and Double Descent
Figure 4 for Unifying Grokking and Double Descent
Viaarxiv icon