Picture for Gregory N. Frank

Gregory N. Frank

How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Add code
Apr 07, 2026
Viaarxiv icon

Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

Add code
Mar 18, 2026
Viaarxiv icon