Alert button
Picture for Can Rager

Can Rager

Alert button

Colorado School of Mines, Department of Applied Mathematics and Statistics

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Bookmark button
Alert button
Mar 31, 2024
Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Viaarxiv icon

Structured World Representations in Maze-Solving Transformers

Add code
Bookmark button
Alert button
Dec 05, 2023
Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung

Viaarxiv icon

Attribution Patching Outperforms Automated Circuit Discovery

Add code
Bookmark button
Alert button
Oct 16, 2023
Aaquib Syed, Can Rager, Arthur Conmy

Viaarxiv icon

An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l

Add code
Bookmark button
Alert button
Oct 14, 2023
James Dao, Yeu-Tong Lau, Can Rager, Jett Janiak

Viaarxiv icon

A Configurable Library for Generating and Manipulating Maze Datasets

Add code
Bookmark button
Alert button
Sep 19, 2023
Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman Räuker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn, Samy Wu Fung

Viaarxiv icon