Picture for Wes Gurnee

Wes Gurnee

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

Add code
Aug 26, 2024
Figure 1 for Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Figure 2 for Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Viaarxiv icon

The Remarkable Robustness of LLMs: Stages of Inference?

Add code
Jun 27, 2024
Figure 1 for The Remarkable Robustness of LLMs: Stages of Inference?
Figure 2 for The Remarkable Robustness of LLMs: Stages of Inference?
Figure 3 for The Remarkable Robustness of LLMs: Stages of Inference?
Figure 4 for The Remarkable Robustness of LLMs: Stages of Inference?
Viaarxiv icon

Confidence Regulation Neurons in Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Refusal in Language Models Is Mediated by a Single Direction

Add code
Jun 17, 2024
Figure 1 for Refusal in Language Models Is Mediated by a Single Direction
Figure 2 for Refusal in Language Models Is Mediated by a Single Direction
Figure 3 for Refusal in Language Models Is Mediated by a Single Direction
Figure 4 for Refusal in Language Models Is Mediated by a Single Direction
Viaarxiv icon

Not All Language Model Features Are Linear

Add code
May 23, 2024
Figure 1 for Not All Language Model Features Are Linear
Figure 2 for Not All Language Model Features Are Linear
Figure 3 for Not All Language Model Features Are Linear
Figure 4 for Not All Language Model Features Are Linear
Viaarxiv icon

Universal Neurons in GPT2 Language Models

Add code
Jan 22, 2024
Figure 1 for Universal Neurons in GPT2 Language Models
Figure 2 for Universal Neurons in GPT2 Language Models
Figure 3 for Universal Neurons in GPT2 Language Models
Figure 4 for Universal Neurons in GPT2 Language Models
Viaarxiv icon

Training Dynamics of Contextual N-Grams in Language Models

Add code
Nov 01, 2023
Figure 1 for Training Dynamics of Contextual N-Grams in Language Models
Figure 2 for Training Dynamics of Contextual N-Grams in Language Models
Figure 3 for Training Dynamics of Contextual N-Grams in Language Models
Figure 4 for Training Dynamics of Contextual N-Grams in Language Models
Viaarxiv icon

Language Models Represent Space and Time

Add code
Oct 03, 2023
Figure 1 for Language Models Represent Space and Time
Figure 2 for Language Models Represent Space and Time
Figure 3 for Language Models Represent Space and Time
Figure 4 for Language Models Represent Space and Time
Viaarxiv icon

Finding Neurons in a Haystack: Case Studies with Sparse Probing

Add code
May 02, 2023
Viaarxiv icon

Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization

Add code
Jun 01, 2022
Figure 1 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Figure 2 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Figure 3 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Figure 4 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Viaarxiv icon