Alert button
Picture for Atticus Geiger

Atticus Geiger

Alert button

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Mar 12, 2024
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Feb 27, 2024
Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

Viaarxiv icon

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Jan 23, 2024
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

Viaarxiv icon

Linear Representations of Sentiment in Large Language Models

Oct 23, 2023
Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda

Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Sep 19, 2023
Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Viaarxiv icon

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

May 30, 2023
Jingyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger

Figure 1 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Figure 2 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Figure 3 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Figure 4 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Viaarxiv icon

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

May 15, 2023
Zhengxuan Wu, Atticus Geiger, Christopher Potts, Noah D. Goodman

Figure 1 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 2 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 3 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 4 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Viaarxiv icon

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Mar 05, 2023
Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

Figure 1 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 2 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 3 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 4 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Viaarxiv icon

Causal Abstraction for Faithful Model Interpretation

Jan 11, 2023
Atticus Geiger, Chris Potts, Thomas Icard

Figure 1 for Causal Abstraction for Faithful Model Interpretation
Figure 2 for Causal Abstraction for Faithful Model Interpretation
Figure 3 for Causal Abstraction for Faithful Model Interpretation
Figure 4 for Causal Abstraction for Faithful Model Interpretation
Viaarxiv icon

Causal Abstraction with Soft Interventions

Nov 22, 2022
Riccardo Massidda, Atticus Geiger, Thomas Icard, Davide Bacciu

Figure 1 for Causal Abstraction with Soft Interventions
Figure 2 for Causal Abstraction with Soft Interventions
Figure 3 for Causal Abstraction with Soft Interventions
Figure 4 for Causal Abstraction with Soft Interventions
Viaarxiv icon