Alert button

Preference Poisoning Attacks on Reward Model Learning

Feb 02, 2024
Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: