Alert button

The Hydra Effect: Emergent Self-repair in Language Model Computations

Jul 28, 2023
Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg

Figure 1 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Figure 2 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Figure 3 for The Hydra Effect: Emergent Self-repair in Language Model Computations
Figure 4 for The Hydra Effect: Emergent Self-repair in Language Model Computations

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: