Picture for Lukas Fluri

Lukas Fluri

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

Add code
Jun 22, 2024
Viaarxiv icon

Evaluating Superhuman Models with Consistency Checks

Add code
Jun 19, 2023
Figure 1 for Evaluating Superhuman Models with Consistency Checks
Figure 2 for Evaluating Superhuman Models with Consistency Checks
Figure 3 for Evaluating Superhuman Models with Consistency Checks
Figure 4 for Evaluating Superhuman Models with Consistency Checks
Viaarxiv icon