Alert button

Policy Optimization in RLHF: The Impact of Out-of-preference Data

Dec 17, 2023
Ziniu Li, Tian Xu, Yang Yu

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: