Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?

Feb 27, 2026

Xiaochuang Yuan, Hui Xu, Silvia Xu, Cui Zou, Jing Xiong

Share this with someone who'll enjoy it:

Abstract:Evaluating AI agents in finance faces two key challenges: static benchmarks require costly expert annotation yet miss the dynamic decision-making central to real-world trading, while LLM-based judges introduce uncontrolled variance on domain-specific tasks. We introduce TraderBench, a benchmark that addresses both issues. It combines expert-verified static tasks (knowledge retrieval, analytical reasoning) with adversarial trading simulations scored purely on realized performance-Sharpe ratio, returns, and drawdown-eliminating judge variance entirely. The framework features two novel tracks: crypto trading with four progressive market-manipulation transforms, and options derivatives scoring across P&L accuracy, Greeks, and risk management. Trading scenarios can be refreshed with new market data to prevent benchmark contamination. Evaluating 13 models (8B open-source to frontier) on ~50 tasks, we find: (1) 8 of 13 models score ~33 on crypto with <1-point variation across adversarial conditions, exposing fixed non-adaptive strategies; (2) extended thinking helps retrieval (+26 points) but has zero impact on trading (+0.3 crypto, -0.1 options). These findings reveal that current agents lack genuine market adaptation, underscoring the need for performance-grounded evaluation in finance.

* Equal Contribution: Xiaochuang Yuan and Hui Xu contributed equally to this work. All correspondence should be directed to yxc20098@gmail.com. Submitted to Agents in the Wild Workshop, ICLR2026

View paper on

Share this with someone who'll enjoy it:

Title:TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?

Paper and Code