Picture for Marcus Williams

Marcus Williams

CTRL-Rec: Controlling Recommender Systems With Natural Language

Add code
Oct 14, 2025
Viaarxiv icon

Stress Testing Deliberative Alignment for Anti-Scheming Training

Add code
Sep 19, 2025
Viaarxiv icon

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Viaarxiv icon

Multi-objective Reinforcement learning from AI Feedback

Add code
Jun 12, 2024
Figure 1 for Multi-objective Reinforcement learning from AI Feedback
Figure 2 for Multi-objective Reinforcement learning from AI Feedback
Figure 3 for Multi-objective Reinforcement learning from AI Feedback
Figure 4 for Multi-objective Reinforcement learning from AI Feedback
Viaarxiv icon

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

Add code
Oct 18, 2023
Figure 1 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Figure 2 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Figure 3 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Figure 4 for On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
Viaarxiv icon