Picture for Alexey Dontsov

Alexey Dontsov

The Rogue Scalpel: Activation Steering Compromises LLM Safety

Add code
Sep 26, 2025
Viaarxiv icon

Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs

Add code
Sep 26, 2025
Viaarxiv icon

OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features

Add code
Sep 26, 2025
Viaarxiv icon

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Add code
Mar 24, 2025
Viaarxiv icon

CLEAR: Character Unlearning in Textual and Visual Modalities

Add code
Oct 23, 2024
Figure 1 for CLEAR: Character Unlearning in Textual and Visual Modalities
Figure 2 for CLEAR: Character Unlearning in Textual and Visual Modalities
Figure 3 for CLEAR: Character Unlearning in Textual and Visual Modalities
Figure 4 for CLEAR: Character Unlearning in Textual and Visual Modalities
Viaarxiv icon