Picture for Kaarel Hänni

Kaarel Hänni

Mathematical Models of Computation in Superposition

Add code
Aug 10, 2024
Viaarxiv icon

Cluster-norm for Unsupervised Probing of Knowledge

Add code
Jul 26, 2024
Viaarxiv icon

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Add code
May 17, 2024
Figure 1 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 2 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 3 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 4 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Viaarxiv icon

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Add code
May 17, 2024
Viaarxiv icon