Picture for Brad Kenstler

Brad Kenstler

Imitation Learning for Multi-turn LM Agents via On-policy Expert Corrections

Add code
Dec 16, 2025
Viaarxiv icon

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

Add code
Nov 10, 2025
Viaarxiv icon

Remote Labor Index: Measuring AI Automation of Remote Work

Add code
Oct 30, 2025
Viaarxiv icon

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Figure 1 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Figure 2 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Figure 3 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Figure 4 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Viaarxiv icon