Picture for Owen Lewis

Owen Lewis

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

Add code
Mar 05, 2026
Viaarxiv icon

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability

Add code
Feb 11, 2026
Viaarxiv icon

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Add code
Feb 02, 2026
Viaarxiv icon

Localizing Paragraph Memorization in Language Models

Add code
Mar 28, 2024
Viaarxiv icon