Picture for Thomas McGrath

Thomas McGrath

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

Add code
May 12, 2026
Viaarxiv icon

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

Add code
May 06, 2026
Viaarxiv icon

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability

Add code
Feb 11, 2026
Viaarxiv icon

Copy Suppression: Comprehensively Understanding an Attention Head

Add code
Oct 06, 2023
Figure 1 for Copy Suppression: Comprehensively Understanding an Attention Head
Figure 2 for Copy Suppression: Comprehensively Understanding an Attention Head
Figure 3 for Copy Suppression: Comprehensively Understanding an Attention Head
Figure 4 for Copy Suppression: Comprehensively Understanding an Attention Head
Viaarxiv icon

The Hydra Effect: Emergent Self-repair in Language Model Computations

Add code
Jul 28, 2023
Viaarxiv icon

Tracr: Compiled Transformers as a Laboratory for Interpretability

Add code
Jan 12, 2023
Viaarxiv icon

Acquisition of Chess Knowledge in AlphaZero

Add code
Nov 27, 2021
Figure 1 for Acquisition of Chess Knowledge in AlphaZero
Figure 2 for Acquisition of Chess Knowledge in AlphaZero
Figure 3 for Acquisition of Chess Knowledge in AlphaZero
Figure 4 for Acquisition of Chess Knowledge in AlphaZero
Viaarxiv icon