Alert button
Picture for Xinbo Xu

Xinbo Xu

Alert button

Safe RLHF: Safe Reinforcement Learning from Human Feedback

Add code
Bookmark button
Alert button
Oct 19, 2023
Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang

Figure 1 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 2 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 3 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Figure 4 for Safe RLHF: Safe Reinforcement Learning from Human Feedback
Viaarxiv icon