Picture for Artem Zhuravel

Artem Zhuravel

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation

Add code
Feb 11, 2026
Viaarxiv icon