Picture for David Wagner

David Wagner

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Add code
Jul 03, 2025
Viaarxiv icon

JULI: Jailbreak Large Language Models by Self-Introspection

Add code
May 17, 2025
Viaarxiv icon

Toxicity Detection for Free

Add code
May 29, 2024
Figure 1 for Toxicity Detection for Free
Figure 2 for Toxicity Detection for Free
Figure 3 for Toxicity Detection for Free
Figure 4 for Toxicity Detection for Free
Viaarxiv icon

Certifiably Robust RAG against Retrieval Corruption

Add code
May 24, 2024
Figure 1 for Certifiably Robust RAG against Retrieval Corruption
Figure 2 for Certifiably Robust RAG against Retrieval Corruption
Figure 3 for Certifiably Robust RAG against Retrieval Corruption
Figure 4 for Certifiably Robust RAG against Retrieval Corruption
Viaarxiv icon

Vulnerability Detection with Code Language Models: How Far Are We?

Add code
Mar 27, 2024
Figure 1 for Vulnerability Detection with Code Language Models: How Far Are We?
Figure 2 for Vulnerability Detection with Code Language Models: How Far Are We?
Figure 3 for Vulnerability Detection with Code Language Models: How Far Are We?
Figure 4 for Vulnerability Detection with Code Language Models: How Far Are We?
Viaarxiv icon

Generative AI Security: Challenges and Countermeasures

Add code
Feb 20, 2024
Figure 1 for Generative AI Security: Challenges and Countermeasures
Figure 2 for Generative AI Security: Challenges and Countermeasures
Figure 3 for Generative AI Security: Challenges and Countermeasures
Figure 4 for Generative AI Security: Challenges and Countermeasures
Viaarxiv icon

PAL: Proxy-Guided Black-Box Attack on Large Language Models

Add code
Feb 15, 2024
Figure 1 for PAL: Proxy-Guided Black-Box Attack on Large Language Models
Figure 2 for PAL: Proxy-Guided Black-Box Attack on Large Language Models
Figure 3 for PAL: Proxy-Guided Black-Box Attack on Large Language Models
Figure 4 for PAL: Proxy-Guided Black-Box Attack on Large Language Models
Viaarxiv icon

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

Add code
Jan 08, 2024
Viaarxiv icon

Mark My Words: Analyzing and Evaluating Language Model Watermarks

Add code
Dec 07, 2023
Figure 1 for Mark My Words: Analyzing and Evaluating Language Model Watermarks
Figure 2 for Mark My Words: Analyzing and Evaluating Language Model Watermarks
Figure 3 for Mark My Words: Analyzing and Evaluating Language Model Watermarks
Figure 4 for Mark My Words: Analyzing and Evaluating Language Model Watermarks
Viaarxiv icon

Can LLMs Follow Simple Rules?

Add code
Nov 06, 2023
Figure 1 for Can LLMs Follow Simple Rules?
Figure 2 for Can LLMs Follow Simple Rules?
Figure 3 for Can LLMs Follow Simple Rules?
Figure 4 for Can LLMs Follow Simple Rules?
Viaarxiv icon