Alert button

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Add code
Bookmark button
Alert button
Feb 06, 2024
Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: