Picture for Holden Karnofsky

Holden Karnofsky

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon