Picture for Sam Deverett

Sam Deverett

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Add code
Mar 11, 2026
Viaarxiv icon

Quantifying Frontier LLM Capabilities for Container Sandbox Escape

Add code
Mar 01, 2026
Viaarxiv icon

Improving Methodologies for Agentic Evaluations Across Domains: Leakage of Sensitive Information, Fraud and Cybersecurity Threats

Add code
Jan 22, 2026
Viaarxiv icon