Picture for Anton Korznikov

Anton Korznikov

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Add code
Feb 15, 2026
Viaarxiv icon

OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features

Add code
Sep 26, 2025
Viaarxiv icon

The Rogue Scalpel: Activation Steering Compromises LLM Safety

Add code
Sep 26, 2025
Viaarxiv icon