Picture for Avery Griffin

Avery Griffin

Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs

Add code
Jan 20, 2026
Viaarxiv icon

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Add code
May 17, 2024
Figure 1 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 2 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 3 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 4 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Viaarxiv icon