Picture for Aaquib Syed

Aaquib Syed

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

Add code
Oct 16, 2024
Viaarxiv icon

Refusal in Language Models Is Mediated by a Single Direction

Add code
Jun 17, 2024
Figure 1 for Refusal in Language Models Is Mediated by a Single Direction
Figure 2 for Refusal in Language Models Is Mediated by a Single Direction
Figure 3 for Refusal in Language Models Is Mediated by a Single Direction
Figure 4 for Refusal in Language Models Is Mediated by a Single Direction
Viaarxiv icon

Attribution Patching Outperforms Automated Circuit Discovery

Add code
Oct 16, 2023
Viaarxiv icon