Alert button

Mitigating Reward Hacking via Information-Theoretic Reward Modeling

Feb 16, 2024
Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: