Alert button

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Add code
Bookmark button
Alert button
Feb 26, 2024
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: