Alert button

Stabilizing RLHF through Advantage Model and Selective Rehearsal

Sep 18, 2023
Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu

Figure 1 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Figure 2 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Figure 3 for Stabilizing RLHF through Advantage Model and Selective Rehearsal
Figure 4 for Stabilizing RLHF through Advantage Model and Selective Rehearsal

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: