Picture for Ivan Evtimov

Ivan Evtimov

Jack

RL Is a Hammer and LLMs Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection

Add code
Oct 06, 2025
Viaarxiv icon

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Add code
Apr 30, 2025
Viaarxiv icon

AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

Add code
Mar 12, 2025
Figure 1 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Figure 2 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Figure 3 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Figure 4 for AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Viaarxiv icon

AdvPrefix: An Objective for Nuanced LLM Jailbreaks

Add code
Dec 13, 2024
Figure 1 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Figure 2 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Figure 3 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Figure 4 for AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Viaarxiv icon

Persistent Pre-Training Poisoning of LLMs

Add code
Oct 17, 2024
Viaarxiv icon

Gradient-based Jailbreak Images for Multimodal Fusion Models

Add code
Oct 04, 2024
Figure 1 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 2 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 3 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 4 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Viaarxiv icon

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester

Add code
Oct 02, 2024
Figure 1 for Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Figure 2 for Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Figure 3 for Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Figure 4 for Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

Add code
Apr 16, 2024
Viaarxiv icon

Towards Red Teaming in Multimodal and Multilingual Translation

Add code
Jan 29, 2024
Viaarxiv icon