Picture for Xudong Pan

Xudong Pan

Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search

Add code
Mar 25, 2026
Viaarxiv icon

MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

Add code
Jan 19, 2026
Viaarxiv icon

WebTrap Park: An Automated Platform for Systematic Security Evaluation of Web Agents

Add code
Jan 13, 2026
Viaarxiv icon

When Bots Take the Bait: Exposing and Mitigating the Emerging Social Engineering Attack in Web Automation Agent

Add code
Jan 12, 2026
Viaarxiv icon

Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems

Add code
May 23, 2025
Viaarxiv icon

ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models

Add code
May 22, 2025
Figure 1 for ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models
Figure 2 for ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models
Figure 3 for ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models
Figure 4 for ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models
Viaarxiv icon

Think Twice Before You Act: Enhancing Agent Behavioral Safety with Thought Correction

Add code
May 19, 2025
Viaarxiv icon

OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation

Add code
Apr 18, 2025
Figure 1 for OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation
Figure 2 for OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation
Figure 3 for OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation
Figure 4 for OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation
Viaarxiv icon

StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models

Add code
Apr 14, 2025
Viaarxiv icon

Frontier AI systems have surpassed the self-replicating red line

Add code
Dec 09, 2024
Figure 1 for Frontier AI systems have surpassed the self-replicating red line
Figure 2 for Frontier AI systems have surpassed the self-replicating red line
Figure 3 for Frontier AI systems have surpassed the self-replicating red line
Figure 4 for Frontier AI systems have surpassed the self-replicating red line
Viaarxiv icon