Alert button
Picture for Yuanzhao Zhai

Yuanzhao Zhai

Alert button

COPR: Continual Human Preference Learning via Optimal Policy Regularization

Add code
Bookmark button
Alert button
Feb 27, 2024
Han Zhang, Lin Gui, Yu Lei, Yuanzhao Zhai, Yehong Zhang, Yulan He, Hui Wang, Yue Yu, Kam-Fai Wong, Bin Liang, Ruifeng Xu

Viaarxiv icon

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

Add code
Bookmark button
Alert button
Jan 11, 2024
Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, Huaimin Wang

Viaarxiv icon

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

Add code
Bookmark button
Alert button
Dec 30, 2023
Yuanzhao Zhai, Han Zhang, Yu Lei, Yue Yu, Kele Xu, Dawei Feng, Bo Ding, Huaimin Wang

Viaarxiv icon

COPF: Continual Learning Human Preference through Optimal Policy Fitting

Add code
Bookmark button
Alert button
Oct 28, 2023
Han Zhang, Lin Gui, Yuanzhao Zhai, Hui Wang, Yu Lei, Ruifeng Xu

Figure 1 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Figure 2 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Figure 3 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Figure 4 for COPF: Continual Learning Human Preference through Optimal Policy Fitting
Viaarxiv icon

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Add code
Bookmark button
Alert button
Aug 24, 2022
Zijian Gao, Kele Xu, HengXing Cai, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang

Figure 1 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Figure 2 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Figure 3 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Figure 4 for Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Viaarxiv icon

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

Add code
Bookmark button
Alert button
Aug 24, 2022
Zijian Gao, Kele Xu, YiYing Li, Yuanzhao Zhai, Dawei Feng, Bo Ding, XinJun Mao, Huaimin Wang

Figure 1 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Figure 2 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Figure 3 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Figure 4 for Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Viaarxiv icon