Picture for Kai Fronsdal

Kai Fronsdal

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

Add code
Feb 26, 2026
Viaarxiv icon

MISR: Measuring Instrumental Self-Reasoning in Frontier Models

Add code
Dec 05, 2024
Viaarxiv icon