Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiayun Dong

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Mar 09, 2026

Qianyu Yang, Yang Liu, Jiaqi Li, Jun Bai, Hao Chen, Kaiyuan Chen, Tiliang Duan, Jiayun Dong, Xiaobo Hu, Zixia Jia(+12 more)

Abstract:As language models (LMs) evolve from chat assistants to long-horizon agents capable of multi-step reasoning and tool use, existing benchmarks remain largely confined to structured or exam-style tasks that fall short of real-world professional demands. To this end, we introduce \$OneMillion-Bench \$OneMillion-Bench, a benchmark of 400 expert-curated tasks spanning Law, Finance, Industry, Healthcare, and Natural Science, built to evaluate agents across economically consequential scenarios. Unlike prior work, the benchmark requires retrieving authoritative sources, resolving conflicting evidence, applying domain-specific rules, and making constraint decisions, where correctness depends as much on the reasoning process as the final answer. We adopt a rubric-based evaluation protocol scoring factual accuracy, logical coherence, practical feasibility, and professional compliance, focused on expert-level problems to ensure meaningful differentiation across agents. Together, \$OneMillion-Bench provides a unified testbed for assessing agentic reliability, professional depth, and practical readiness in domain-intensive scenarios.

* 39 pages, 9 figures, 8 tables

Via

Access Paper or Ask Questions

Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Jan 10, 2019

Jiayun Dong, Cynthia Rudin

Figure 1 for Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Figure 2 for Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Figure 3 for Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Figure 4 for Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Abstract:Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare, and in other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them and not to others? In that case, we may not be able to tell from a single well-performing model whether a variable is always important in predicting the outcome. Rather than depending on variable importance for a single predictive model, we would like to explore variable importance for all approximately-equally-accurate predictive models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. Experiments with criminal justice and marketing data illustrate how variables can change dramatically in importance for approximately-equally-accurate predictive models.

Via

Access Paper or Ask Questions