Alert button

Improving Language Models with Advantage-based Offline Policy Gradients

Add code
Bookmark button
Alert button
May 24, 2023
Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark Riedl

Figure 1 for Improving Language Models with Advantage-based Offline Policy Gradients
Figure 2 for Improving Language Models with Advantage-based Offline Policy Gradients
Figure 3 for Improving Language Models with Advantage-based Offline Policy Gradients
Figure 4 for Improving Language Models with Advantage-based Offline Policy Gradients

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: