Alert button

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Add code
Bookmark button
Alert button
Feb 21, 2024
Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: