Picture for Marcus Williams

Marcus Williams

Tony

Monitoring Monitorability

Add code
Dec 20, 2025
Figure 1 for Monitoring Monitorability
Figure 2 for Monitoring Monitorability
Figure 3 for Monitoring Monitorability
Figure 4 for Monitoring Monitorability
Viaarxiv icon

OpenAI GPT-5 System Card

Add code
Dec 19, 2025
Viaarxiv icon

CTRL-Rec: Controlling Recommender Systems With Natural Language

Add code
Oct 14, 2025
Figure 1 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 2 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 3 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Figure 4 for CTRL-Rec: Controlling Recommender Systems With Natural Language
Viaarxiv icon

Stress Testing Deliberative Alignment for Anti-Scheming Training

Add code
Sep 19, 2025
Figure 1 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Figure 2 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Figure 3 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Figure 4 for Stress Testing Deliberative Alignment for Anti-Scheming Training
Viaarxiv icon

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Viaarxiv icon

Multi-objective Reinforcement learning from AI Feedback

Add code
Jun 12, 2024
Figure 1 for Multi-objective Reinforcement learning from AI Feedback
Figure 2 for Multi-objective Reinforcement learning from AI Feedback
Figure 3 for Multi-objective Reinforcement learning from AI Feedback
Figure 4 for Multi-objective Reinforcement learning from AI Feedback
Viaarxiv icon

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

Add code
Oct 18, 2023
Figure 1 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Figure 2 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Figure 3 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Figure 4 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Viaarxiv icon