Picture for Ivan Bercovich

Ivan Bercovich

Heuresis: Search Strategies for Autonomous AI Research Agents Across Quality, Diversity and Novelty

Add code
Jun 23, 2026
Viaarxiv icon

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Add code
Jun 08, 2026
Viaarxiv icon

SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work?

Add code
Jun 05, 2026
Viaarxiv icon

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Add code
Apr 19, 2026
Viaarxiv icon

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Add code
Jan 17, 2026
Viaarxiv icon

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

Add code
Jun 05, 2025
Viaarxiv icon

HardTests: Synthesizing High-Quality Test Cases for LLM Coding

Add code
May 30, 2025
Viaarxiv icon

MessIRve: A Large-Scale Spanish Information Retrieval Dataset

Add code
Sep 09, 2024
Figure 1 for MessIRve: A Large-Scale Spanish Information Retrieval Dataset
Figure 2 for MessIRve: A Large-Scale Spanish Information Retrieval Dataset
Figure 3 for MessIRve: A Large-Scale Spanish Information Retrieval Dataset
Figure 4 for MessIRve: A Large-Scale Spanish Information Retrieval Dataset
Viaarxiv icon