Picture for Aaron Mueller

Aaron Mueller

SAEs Are Good for Steering -- If You Select the Right Features

Add code
May 26, 2025
Viaarxiv icon

How to Improve the Robustness of Closed-Source Models on NLI

Add code
May 26, 2025
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Viaarxiv icon

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Add code
Apr 10, 2025
Viaarxiv icon

Position-aware Automatic Circuit Discovery

Add code
Feb 07, 2025
Viaarxiv icon

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

Add code
Jan 15, 2025
Figure 1 for Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
Figure 2 for Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
Figure 3 for Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
Figure 4 for Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
Viaarxiv icon

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Add code
Jan 10, 2025
Viaarxiv icon

Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Add code
Dec 06, 2024
Viaarxiv icon

Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models

Add code
Dec 06, 2024
Viaarxiv icon

Characterizing the Role of Similarity in the Property Inferences of Language Models

Add code
Oct 29, 2024
Viaarxiv icon