Picture for Mia Taylor

Mia Taylor

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

Add code
Oct 05, 2025
Viaarxiv icon

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

Add code
Aug 24, 2025
Viaarxiv icon

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models

Add code
Jun 16, 2025
Viaarxiv icon

Model Organisms for Emergent Misalignment

Add code
Jun 13, 2025
Figure 1 for Model Organisms for Emergent Misalignment
Figure 2 for Model Organisms for Emergent Misalignment
Figure 3 for Model Organisms for Emergent Misalignment
Figure 4 for Model Organisms for Emergent Misalignment
Viaarxiv icon