Picture for Zhifeng Lu

Zhifeng Lu

Interpretable Safety Alignment via SAE-Constructed Low-Rank Subspace Adaptation

Add code
Dec 29, 2025
Viaarxiv icon