Alert button

GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

Feb 05, 2024
Haibo Jin, Ruoxi Chen, Andy Zhou, Jinyin Chen, Yang Zhang, Haohan Wang

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: