Picture for Eric J. Michaud

Eric J. Michaud

On the creation of narrow AI: hierarchy and nonlocality of neural network skills

Add code
May 21, 2025
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Physics of Skill Learning

Add code
Jan 21, 2025
Figure 1 for Physics of Skill Learning
Figure 2 for Physics of Skill Learning
Figure 3 for Physics of Skill Learning
Figure 4 for Physics of Skill Learning
Viaarxiv icon

Efficient Dictionary Learning with Switch Sparse Autoencoders

Add code
Oct 10, 2024
Figure 1 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 2 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 3 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Figure 4 for Efficient Dictionary Learning with Switch Sparse Autoencoders
Viaarxiv icon

Survival of the Fittest Representation: A Case Study with Modular Addition

Add code
May 27, 2024
Figure 1 for Survival of the Fittest Representation: A Case Study with Modular Addition
Figure 2 for Survival of the Fittest Representation: A Case Study with Modular Addition
Figure 3 for Survival of the Fittest Representation: A Case Study with Modular Addition
Figure 4 for Survival of the Fittest Representation: A Case Study with Modular Addition
Viaarxiv icon

Not All Language Model Features Are Linear

Add code
May 23, 2024
Figure 1 for Not All Language Model Features Are Linear
Figure 2 for Not All Language Model Features Are Linear
Figure 3 for Not All Language Model Features Are Linear
Figure 4 for Not All Language Model Features Are Linear
Viaarxiv icon

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Mar 31, 2024
Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Viaarxiv icon

Opening the AI black box: program synthesis via mechanistic interpretability

Add code
Feb 07, 2024
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Jul 27, 2023
Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

The Quantization Model of Neural Scaling

Add code
Mar 23, 2023
Viaarxiv icon