Picture for Martin Leitgab

Martin Leitgab

Endogenous Resistance to Activation Steering in Language Models

Add code
Feb 06, 2026
Viaarxiv icon