Alert button

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

Jul 24, 2023
Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: