Alert button
Picture for Thomas McGrath

Thomas McGrath

Alert button

Copy Suppression: Comprehensively Understanding an Attention Head

Add code
Bookmark button
Alert button
Oct 06, 2023
Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda

Viaarxiv icon

The Hydra Effect: Emergent Self-repair in Language Model Computations

Add code
Bookmark button
Alert button
Jul 28, 2023
Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg

Figure 1 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Figure 2 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Figure 3 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Figure 4 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Viaarxiv icon

Tracr: Compiled Transformers as a Laboratory for Interpretability

Add code
Bookmark button
Alert button
Jan 12, 2023
David Lindner, János Kramár, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik

Figure 1 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Figure 2 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Figure 3 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Figure 4 for Tracr: Compiled Transformers as a Laboratory for Interpretability
Viaarxiv icon

Acquisition of Chess Knowledge in AlphaZero

Add code
Bookmark button
Alert button
Nov 27, 2021
Thomas McGrath, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, Vladimir Kramnik

Figure 1 for Acquisition of Chess Knowledge in AlphaZero
Figure 2 for Acquisition of Chess Knowledge in AlphaZero
Figure 3 for Acquisition of Chess Knowledge in AlphaZero
Figure 4 for Acquisition of Chess Knowledge in AlphaZero
Viaarxiv icon