Alert button
Picture for Binghai Wang

Binghai Wang

Alert button

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Add code
Bookmark button
Alert button
Jan 12, 2024
Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

Viaarxiv icon

Secrets of RLHF in Large Language Models Part I: PPO

Add code
Bookmark button
Alert button
Jul 18, 2023
Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang

Figure 1 for Secrets of RLHF in Large Language Models Part I: PPO
Figure 2 for Secrets of RLHF in Large Language Models Part I: PPO
Figure 3 for Secrets of RLHF in Large Language Models Part I: PPO
Figure 4 for Secrets of RLHF in Large Language Models Part I: PPO
Viaarxiv icon