Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gauri Kholkar

CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

May 18, 2025

Gauri Kholkar, Ratinder Ahuja

Figure 1 for CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Figure 2 for CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Figure 3 for CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Figure 4 for CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Abstract:Prompt injection remains a major security risk for large language models. However, the efficacy of existing guardrail models in context-aware settings remains underexplored, as they often rely on static attack benchmarks. Additionally, they have over-defense tendencies. We introduce CAPTURE, a novel context-aware benchmark assessing both attack detection and over-defense tendencies with minimal in-domain examples. Our experiments reveal that current prompt injection guardrail models suffer from high false negatives in adversarial cases and excessive false positives in benign scenarios, highlighting critical limitations.

* Accepted in ACL LLMSec Workshop 2025

Via

Access Paper or Ask Questions

Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Dec 18, 2024

Shanu Kumar, Gauri Kholkar, Saish Mendke, Anubhav Sadana, Parag Agrawal, Sandipan Dandapat

Figure 1 for Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Figure 2 for Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Figure 3 for Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Figure 4 for Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation

Abstract:With the growth of social media and large language models, content moderation has become crucial. Many existing datasets lack adequate representation of different groups, resulting in unreliable assessments. To tackle this, we propose a socio-culturally aware evaluation framework for LLM-driven content moderation and introduce a scalable method for creating diverse datasets using persona-based generation. Our analysis reveals that these datasets provide broader perspectives and pose greater challenges for LLMs than diversity-focused generation methods without personas. This challenge is especially pronounced in smaller LLMs, emphasizing the difficulties they encounter in moderating such diverse content.

* Accepted in SUMEval Workshop in COLING 2025

Via

Access Paper or Ask Questions