Picture for Tomek Korbak

Tomek Korbak

Michael Pokorny

Training Agents to Self-Report Misbehavior

Add code
Feb 25, 2026
Viaarxiv icon

Async Control: Stress-testing Asynchronous Control Measures for LLM Agents

Add code
Dec 15, 2025
Viaarxiv icon

Practical challenges of control monitoring in frontier AI deployments

Add code
Dec 15, 2025
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Figure 1 for Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Viaarxiv icon

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence

Add code
Apr 07, 2025
Viaarxiv icon

Fundamental Limitations in Defending LLM Finetuning APIs

Add code
Feb 20, 2025
Figure 1 for Fundamental Limitations in Defending LLM Finetuning APIs
Figure 2 for Fundamental Limitations in Defending LLM Finetuning APIs
Figure 3 for Fundamental Limitations in Defending LLM Finetuning APIs
Figure 4 for Fundamental Limitations in Defending LLM Finetuning APIs
Viaarxiv icon

A sketch of an AI control safety case

Add code
Jan 28, 2025
Figure 1 for A sketch of an AI control safety case
Figure 2 for A sketch of an AI control safety case
Figure 3 for A sketch of an AI control safety case
Figure 4 for A sketch of an AI control safety case
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C

Add code
Nov 25, 2024
Figure 1 for The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C
Figure 2 for The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C
Figure 3 for The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C
Figure 4 for The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Figure 1 for Towards evaluations-based safety cases for AI scheming
Figure 2 for Towards evaluations-based safety cases for AI scheming
Figure 3 for Towards evaluations-based safety cases for AI scheming
Figure 4 for Towards evaluations-based safety cases for AI scheming
Viaarxiv icon