Picture for Nathaniel Mitrani Hadida

Nathaniel Mitrani Hadida

Behavioural Analysis of Alignment Faking

Add code
May 26, 2026
Viaarxiv icon

Chain-of-thought obfuscation learned from output supervision can generalise to unseen tasks

Add code
Jan 30, 2026
Viaarxiv icon