The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features

Add code
Sep 16, 2025
Figure 1 for The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Figure 2 for The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Figure 3 for The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
Figure 4 for The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: