Picture for Kai Fronsdal

Kai Fronsdal

Evaluating whether AI models would sabotage AI safety research

Add code
Apr 27, 2026
Viaarxiv icon

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

Add code
Feb 26, 2026
Viaarxiv icon

MISR: Measuring Instrumental Self-Reasoning in Frontier Models

Add code
Dec 05, 2024
Viaarxiv icon