Alert button
Picture for Zhichen Dong

Zhichen Dong

Alert button

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Add code
Bookmark button
Alert button
Feb 21, 2024
Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Viaarxiv icon

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

Add code
Bookmark button
Alert button
Feb 14, 2024
Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao

Viaarxiv icon