Picture for Xander Davies

Xander Davies

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Add code
Jun 12, 2026
Viaarxiv icon

Gaming AI-Assisted Peer Reviews Poses New Risks to the Scientific Community

Add code
Jun 08, 2026
Viaarxiv icon

Evaluating whether AI models would sabotage AI safety research

Add code
Apr 27, 2026
Viaarxiv icon

UK AISI Alignment Evaluation Case-Study

Add code
Apr 01, 2026
Viaarxiv icon

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Add code
Mar 16, 2026
Viaarxiv icon

Boundary Point Jailbreaking of Black-Box LLMs

Add code
Feb 16, 2026
Viaarxiv icon

Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents

Add code
Oct 26, 2025
Figure 1 for Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Figure 2 for Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Figure 3 for Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Figure 4 for Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Viaarxiv icon

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Add code
Oct 08, 2025
Viaarxiv icon

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Add code
Jul 28, 2025
Viaarxiv icon

An Example Safety Case for Safeguards Against Misuse

Add code
May 23, 2025
Viaarxiv icon