Alert button

Prior Constraints-based Reward Model Training for Aligning Large Language Models

Apr 01, 2024
Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: