Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection

Add code
Nov 01, 2024
Figure 1 for Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection
Figure 2 for Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection
Figure 3 for Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection
Figure 4 for Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: