Picture for Jacob Steinhardt

Jacob Steinhardt

ADAG: Automatically Describing Attribution Graphs

Add code
Apr 08, 2026
Viaarxiv icon

Learning a Generative Meta-Model of LLM Activations

Add code
Feb 06, 2026
Viaarxiv icon

Language Model Circuits Are Sparse in the Neuron Basis

Add code
Jan 30, 2026
Viaarxiv icon

Predictive Concept Decoders: Training Scalable End-to-End Interpretability Assistants

Add code
Dec 17, 2025
Viaarxiv icon

Training Language Models to Explain Their Own Computations

Add code
Nov 11, 2025
Figure 1 for Training Language Models to Explain Their Own Computations
Figure 2 for Training Language Models to Explain Their Own Computations
Figure 3 for Training Language Models to Explain Their Own Computations
Figure 4 for Training Language Models to Explain Their Own Computations
Viaarxiv icon

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Add code
Jul 03, 2025
Figure 1 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 2 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 3 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Figure 4 for Establishing Best Practices for Building Rigorous Agentic Benchmarks
Viaarxiv icon

Understanding In-context Learning of Addition via Activation Subspaces

Add code
May 08, 2025
Figure 1 for Understanding In-context Learning of Addition via Activation Subspaces
Figure 2 for Understanding In-context Learning of Addition via Activation Subspaces
Figure 3 for Understanding In-context Learning of Addition via Activation Subspaces
Figure 4 for Understanding In-context Learning of Addition via Activation Subspaces
Viaarxiv icon

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Add code
Mar 06, 2025
Figure 1 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Figure 2 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Figure 3 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Figure 4 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Viaarxiv icon

Which Attention Heads Matter for In-Context Learning?

Add code
Feb 19, 2025
Figure 1 for Which Attention Heads Matter for In-Context Learning?
Figure 2 for Which Attention Heads Matter for In-Context Learning?
Figure 3 for Which Attention Heads Matter for In-Context Learning?
Figure 4 for Which Attention Heads Matter for In-Context Learning?
Viaarxiv icon

Eliciting Language Model Behaviors with Investigator Agents

Add code
Feb 03, 2025
Viaarxiv icon