Picture for Niels Warncke

Niels Warncke

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

Add code
Oct 05, 2025
Figure 1 for Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Figure 2 for Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Figure 3 for Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Figure 4 for Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Viaarxiv icon

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Add code
Feb 25, 2025
Viaarxiv icon