Picture for Eric J. Michaud

Eric J. Michaud

Survival of the Fittest Representation: A Case Study with Modular Addition

Add code
May 27, 2024
Figure 1 for Survival of the Fittest Representation: A Case Study with Modular Addition
Figure 2 for Survival of the Fittest Representation: A Case Study with Modular Addition
Figure 3 for Survival of the Fittest Representation: A Case Study with Modular Addition
Figure 4 for Survival of the Fittest Representation: A Case Study with Modular Addition
Viaarxiv icon

Not All Language Model Features Are Linear

Add code
May 23, 2024
Viaarxiv icon

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Mar 31, 2024
Viaarxiv icon

Opening the AI black box: program synthesis via mechanistic interpretability

Add code
Feb 07, 2024
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Viaarxiv icon

The Quantization Model of Neural Scaling

Add code
Mar 23, 2023
Viaarxiv icon

Precision Machine Learning

Add code
Oct 24, 2022
Viaarxiv icon

Omnigrok: Grokking Beyond Algorithmic Data

Add code
Oct 03, 2022
Figure 1 for Omnigrok: Grokking Beyond Algorithmic Data
Figure 2 for Omnigrok: Grokking Beyond Algorithmic Data
Figure 3 for Omnigrok: Grokking Beyond Algorithmic Data
Figure 4 for Omnigrok: Grokking Beyond Algorithmic Data
Viaarxiv icon

Towards Understanding Grokking: An Effective Theory of Representation Learning

Add code
May 20, 2022
Figure 1 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Figure 2 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Figure 3 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Figure 4 for Towards Understanding Grokking: An Effective Theory of Representation Learning
Viaarxiv icon

Understanding Learned Reward Functions

Add code
Dec 10, 2020
Figure 1 for Understanding Learned Reward Functions
Figure 2 for Understanding Learned Reward Functions
Figure 3 for Understanding Learned Reward Functions
Figure 4 for Understanding Learned Reward Functions
Viaarxiv icon