Alert button

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Jan 29, 2024
Banghua Zhu, Michael I. Jordan, Jiantao Jiao

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: