Picture for Henry Papadatos

Henry Papadatos

Evaluating the Goal-Directedness of Large Language Models

Add code
Apr 16, 2025
Viaarxiv icon

Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation

Add code
Mar 06, 2025
Viaarxiv icon

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

Add code
Feb 10, 2025
Figure 1 for A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
Figure 2 for A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
Viaarxiv icon

Linear Probe Penalties Reduce LLM Sycophancy

Add code
Dec 01, 2024
Viaarxiv icon