Picture for Stephen Casper

Stephen Casper

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

Add code
Feb 06, 2026
Viaarxiv icon

Internal Deployment Gaps in AI Regulation

Add code
Jan 12, 2026
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

Practical Principles for AI Cost and Compute Accounting

Add code
Feb 21, 2025
Figure 1 for Practical Principles for AI Cost and Compute Accounting
Viaarxiv icon

Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives

Add code
Feb 17, 2025
Viaarxiv icon

The AI Agent Index

Add code
Feb 03, 2025
Figure 1 for The AI Agent Index
Figure 2 for The AI Agent Index
Figure 3 for The AI Agent Index
Figure 4 for The AI Agent Index
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Figure 1 for International AI Safety Report
Figure 2 for International AI Safety Report
Figure 3 for International AI Safety Report
Figure 4 for International AI Safety Report
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Obfuscated Activations Bypass LLM Latent-Space Defenses

Add code
Dec 12, 2024
Viaarxiv icon