Picture for Lukas Fluri

Lukas Fluri

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Add code
Jun 22, 2024
Viaarxiv icon

Evaluating Superhuman Models with Consistency Checks

Add code
Jun 19, 2023
Viaarxiv icon