Picture for Joseph Miller

Joseph Miller

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

Add code
Oct 06, 2024
Figure 1 for Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Figure 2 for Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Figure 3 for Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Figure 4 for Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Viaarxiv icon

Transformer Circuit Faithfulness Metrics are not Robust

Add code
Jul 11, 2024
Viaarxiv icon

Adversarial Policies Beat Professional-Level Go AIs

Add code
Nov 01, 2022
Viaarxiv icon