Picture for Peter Henderson

Peter Henderson

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection

Add code
Oct 24, 2025
Viaarxiv icon

Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing

Add code
Jul 02, 2025
Figure 1 for Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing
Figure 2 for Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing
Figure 3 for Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing
Viaarxiv icon

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Add code
Jun 13, 2025
Viaarxiv icon

Dynamic Risk Assessments for Offensive Cybersecurity Agents

Add code
May 23, 2025
Figure 1 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 2 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 3 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Figure 4 for Dynamic Risk Assessments for Offensive Cybersecurity Agents
Viaarxiv icon

A Reasoning-Focused Legal Retrieval Benchmark

Add code
May 06, 2025
Viaarxiv icon

The Deployment of End-to-End Audio Language Models Should Take into Account the Principle of Least Privilege

Add code
Mar 21, 2025
Viaarxiv icon

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Add code
Mar 09, 2025
Viaarxiv icon

Breaking Down Bias: On The Limits of Generalizable Pruning Strategies

Add code
Feb 11, 2025
Viaarxiv icon

The Mirage of Artificial Intelligence Terms of Use Restrictions

Add code
Dec 10, 2024
Figure 1 for The Mirage of Artificial Intelligence Terms of Use Restrictions
Viaarxiv icon