Alert button

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Dec 18, 2023
Wei Xiong, Hanze Dong, Chenlu Ye, Han Zhong, Nan Jiang, Tong Zhang

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: