Picture for Anietta Weckauff

Anietta Weckauff

Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

Add code
Feb 16, 2026
Viaarxiv icon