Alert button

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Mar 04, 2024
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

Figure 1 for Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: