Picture for Anthony Payne

Anthony Payne

Evaluating Language Models for Harmful Manipulation

Add code
Mar 26, 2026
Viaarxiv icon