Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization

Add code
Jun 08, 2026

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: