Sparse Autoencoder


Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Add code
Jun 12, 2025
Viaarxiv icon

Resa: Transparent Reasoning Models via SAEs

Add code
Jun 11, 2025
Viaarxiv icon

Advanced fraud detection using machine learning models: enhancing financial transaction security

Add code
Jun 12, 2025
Viaarxiv icon

Training Superior Sparse Autoencoders for Instruct Models

Add code
Jun 09, 2025
Viaarxiv icon

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Add code
Jun 11, 2025
Viaarxiv icon

Improving LLM Reasoning through Interpretable Role-Playing Steering

Add code
Jun 09, 2025
Viaarxiv icon

Guided Graph Compression for Quantum Graph Neural Networks

Add code
Jun 11, 2025
Viaarxiv icon

Transferring Features Across Language Models With Model Stitching

Add code
Jun 07, 2025
Viaarxiv icon

Sparse Autoencoders, Again?

Add code
Jun 06, 2025
Viaarxiv icon

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Add code
Jun 05, 2025
Viaarxiv icon