Alert button

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Add code
Bookmark button
Alert button
Oct 23, 2023
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun

Figure 1 for AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models
Figure 2 for AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models
Figure 3 for AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models
Figure 4 for AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: