Picture for Khalil Al-Hussaeni

Khalil Al-Hussaeni

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

Add code
Dec 23, 2025
Viaarxiv icon