Alert button

Defining and Characterizing Reward Hacking

Sep 27, 2022
Joar Skalse, Nikolaus H. R. Howe, Dmitrii Krasheninnikov, David Krueger

Figure 1 for Defining and Characterizing Reward Hacking
Figure 2 for Defining and Characterizing Reward Hacking
Figure 3 for Defining and Characterizing Reward Hacking
Figure 4 for Defining and Characterizing Reward Hacking

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: