Picture for Aaron Mueller

Aaron Mueller

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Add code
Oct 28, 2024
Viaarxiv icon

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Add code
Aug 02, 2024
Figure 1 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Figure 2 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Figure 3 for The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Viaarxiv icon

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Add code
Jul 18, 2024
Figure 1 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 2 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 3 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 4 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Viaarxiv icon

Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks

Add code
Jul 05, 2024
Figure 1 for Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Figure 2 for Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Figure 3 for Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
Viaarxiv icon

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Add code
Apr 09, 2024
Figure 1 for [Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
Viaarxiv icon

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Mar 31, 2024
Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Viaarxiv icon

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Add code
Nov 13, 2023
Figure 1 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Figure 2 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Figure 3 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Figure 4 for In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Viaarxiv icon

Function Vectors in Large Language Models

Add code
Oct 23, 2023
Viaarxiv icon

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning

Add code
Jun 30, 2023
Viaarxiv icon

Inverse Scaling: When Bigger Isn't Better

Add code
Jun 15, 2023
Figure 1 for Inverse Scaling: When Bigger Isn't Better
Figure 2 for Inverse Scaling: When Bigger Isn't Better
Figure 3 for Inverse Scaling: When Bigger Isn't Better
Figure 4 for Inverse Scaling: When Bigger Isn't Better
Viaarxiv icon