Picture for Joan Cabezas

Joan Cabezas

SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work?

Add code
Jun 05, 2026
Viaarxiv icon

CUBE: A Standard for Unifying Agent Benchmarks

Add code
Mar 16, 2026
Viaarxiv icon