Picture for Atticus Geiger

Atticus Geiger

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

Add code
Mar 05, 2026
Viaarxiv icon

Surgical Activation Steering via Generative Causal Mediation

Add code
Feb 17, 2026
Viaarxiv icon

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Add code
Feb 02, 2026
Viaarxiv icon

From Directions to Regions: Decomposing Activations in Language Models via Local Geometry

Add code
Feb 02, 2026
Viaarxiv icon

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Add code
Nov 06, 2025
Viaarxiv icon

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Add code
Jun 12, 2025
Figure 1 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Figure 2 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Figure 3 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Figure 4 for Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization
Viaarxiv icon

How Do Transformers Learn Variable Binding in Symbolic Programs?

Add code
May 27, 2025
Viaarxiv icon

Language Models use Lookbacks to Track Beliefs

Add code
May 20, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Figure 1 for MIB: A Mechanistic Interpretability Benchmark
Figure 2 for MIB: A Mechanistic Interpretability Benchmark
Figure 3 for MIB: A Mechanistic Interpretability Benchmark
Figure 4 for MIB: A Mechanistic Interpretability Benchmark
Viaarxiv icon

Combining Causal Models for More Accurate Abstractions of Neural Networks

Add code
Mar 14, 2025
Figure 1 for Combining Causal Models for More Accurate Abstractions of Neural Networks
Figure 2 for Combining Causal Models for More Accurate Abstractions of Neural Networks
Figure 3 for Combining Causal Models for More Accurate Abstractions of Neural Networks
Figure 4 for Combining Causal Models for More Accurate Abstractions of Neural Networks
Viaarxiv icon