Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Jun 12, 2025

Xiaozhe Li, Jixuan Chen, Xinyu Fang, Shengyuan Ding, Haodong Duan, Qingwen Liu, Kai Chen

Figure 1 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Figure 2 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Figure 3 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Figure 4 for OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have shown remarkable capabilities in solving diverse tasks. However, their proficiency in iteratively optimizing complex solutions through learning from previous feedback remains insufficiently explored. To bridge this gap, we present OPT-BENCH, a comprehensive benchmark designed to evaluate LLM agents on large-scale search space optimization problems. OPT-BENCH includes 20 real-world machine learning tasks sourced from Kaggle and 10 classical NP problems, offering a diverse and challenging environment for assessing LLM agents on iterative reasoning and solution refinement. To enable rigorous evaluation, we introduce OPT-Agent, an end-to-end optimization framework that emulates human reasoning when tackling complex problems by generating, validating, and iteratively improving solutions through leveraging historical feedback. Through extensive experiments on 9 state-of-the-art LLMs from 6 model families, we analyze the effects of optimization iterations, temperature settings, and model architectures on solution quality and convergence. Our results demonstrate that incorporating historical context significantly enhances optimization performance across both ML and NP tasks. All datasets, code, and evaluation tools are open-sourced to promote further research in advancing LLM-driven optimization and iterative reasoning. Project page: \href{https://github.com/OliverLeeXZ/OPT-BENCH}{https://github.com/OliverLeeXZ/OPT-BENCH}.

View paper on

Share this with someone who'll enjoy it:

Title:OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems

Paper and Code