Picture for Owen Lewis

Owen Lewis

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

Add code
Jun 10, 2026
Viaarxiv icon

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

Add code
May 12, 2026
Viaarxiv icon

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

Add code
May 06, 2026
Viaarxiv icon

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

Add code
Mar 05, 2026
Viaarxiv icon

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability

Add code
Feb 11, 2026
Viaarxiv icon

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Add code
Feb 02, 2026
Viaarxiv icon

Localizing Paragraph Memorization in Language Models

Add code
Mar 28, 2024
Viaarxiv icon