Picture for Arkil Patel

Arkil Patel

Michael Pokorny

Toward Open Weight Models Without Risks: Separating Public and Private Capabilities in LLMs

Add code
Jun 19, 2026
Viaarxiv icon

Forecasting Downstream Performance of LLMs With Proxy Metrics

Add code
May 18, 2026
Viaarxiv icon

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Add code
Apr 11, 2025
Figure 1 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 2 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 3 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Figure 4 for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Viaarxiv icon

SafeArena: Evaluating the Safety of Autonomous Web Agents

Add code
Mar 06, 2025
Figure 1 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 2 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 3 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Figure 4 for SafeArena: Evaluating the Safety of Autonomous Web Agents
Viaarxiv icon

How to Get Your LLM to Generate Challenging Problems for Evaluation

Add code
Feb 20, 2025
Figure 1 for How to Get Your LLM to Generate Challenging Problems for Evaluation
Figure 2 for How to Get Your LLM to Generate Challenging Problems for Evaluation
Figure 3 for How to Get Your LLM to Generate Challenging Problems for Evaluation
Figure 4 for How to Get Your LLM to Generate Challenging Problems for Evaluation
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Universal Adversarial Triggers Are Not Universal

Add code
Apr 24, 2024
Figure 1 for Universal Adversarial Triggers Are Not Universal
Figure 2 for Universal Adversarial Triggers Are Not Universal
Figure 3 for Universal Adversarial Triggers Are Not Universal
Figure 4 for Universal Adversarial Triggers Are Not Universal
Viaarxiv icon

Evaluating In-Context Learning of Libraries for Code Generation

Add code
Nov 16, 2023
Viaarxiv icon

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

Add code
Oct 18, 2023
Figure 1 for MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Figure 2 for MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Figure 3 for MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Figure 4 for MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Viaarxiv icon

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

Add code
Oct 04, 2023
Figure 1 for Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Figure 2 for Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Figure 3 for Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Figure 4 for Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Viaarxiv icon