Picture for Jack Merullo

Jack Merullo

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

Add code
Mar 05, 2026
Viaarxiv icon

Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability

Add code
Feb 11, 2026
Viaarxiv icon

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Add code
Feb 02, 2026
Viaarxiv icon

Transferring Features Across Language Models With Model Stitching

Add code
Jun 07, 2025
Viaarxiv icon

On Linear Representations and Pretraining Data Frequency in Language Models

Add code
Apr 16, 2025
Viaarxiv icon

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

Add code
Oct 30, 2024
Figure 1 for $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Figure 2 for $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Figure 3 for $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Figure 4 for $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Viaarxiv icon

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Add code
Jun 13, 2024
Figure 1 for Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Figure 2 for Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Figure 3 for Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Figure 4 for Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Viaarxiv icon

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Add code
May 28, 2024
Figure 1 for Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Figure 2 for Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Figure 3 for Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Figure 4 for Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Viaarxiv icon

Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models

Add code
May 03, 2024
Figure 1 for Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models
Figure 2 for Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models
Figure 3 for Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models
Figure 4 for Axiomatic Causal Interventions for Reverse Engineering Relevance Computation in Neural Retrieval Models
Viaarxiv icon

Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

Add code
Feb 13, 2024
Figure 1 for Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks
Figure 2 for Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks
Figure 3 for Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks
Figure 4 for Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks
Viaarxiv icon