Picture for Or Shafran

Or Shafran

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

Add code
Feb 02, 2026
Viaarxiv icon

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Add code
Jun 12, 2025
Figure 1 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Figure 2 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Figure 3 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Figure 4 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Viaarxiv icon