Picture for Michael Schmatz

Michael Schmatz

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Add code
Mar 11, 2026
Viaarxiv icon

Improving Methodologies for Agentic Evaluations Across Domains: Leakage of Sensitive Information, Fraud and Cybersecurity Threats

Add code
Jan 22, 2026
Viaarxiv icon

RepliBench: Evaluating the autonomous replication capabilities of language model agents

Add code
Apr 21, 2025
Viaarxiv icon