Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Add code
Sep 27, 2023
Figure 1 for Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Figure 2 for Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Figure 3 for Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Figure 4 for Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: