Picture for Prannay Hebbar

Prannay Hebbar

SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work?

Add code
Jun 05, 2026
Viaarxiv icon

SIA: Self Improving AI with Harness & Weight Updates

Add code
May 28, 2026
Viaarxiv icon

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Add code
Apr 15, 2025
Figure 1 for REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Figure 2 for REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Figure 3 for REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Figure 4 for REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Viaarxiv icon