Picture for Oliver Sourbut

Oliver Sourbut

RepliBench: Evaluating the autonomous replication capabilities of language model agents

Add code
Apr 21, 2025
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

Cooperation and Control in Delegation Games

Add code
Feb 24, 2024
Viaarxiv icon