Alert button
Picture for Atticus Geiger

Atticus Geiger

Alert button

ReFT: Representation Finetuning for Language Models

Add code
Bookmark button
Alert button
Apr 08, 2024
Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Bookmark button
Alert button
Mar 12, 2024
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Figure 1 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 2 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 3 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Add code
Bookmark button
Alert button
Feb 27, 2024
Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

Viaarxiv icon

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Add code
Bookmark button
Alert button
Jan 23, 2024
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

Viaarxiv icon

Linear Representations of Sentiment in Large Language Models

Add code
Bookmark button
Alert button
Oct 23, 2023
Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda

Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Add code
Bookmark button
Alert button
Sep 19, 2023
Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Viaarxiv icon

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

Add code
Bookmark button
Alert button
May 30, 2023
Jingyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger

Figure 1 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Figure 2 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Figure 3 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Figure 4 for ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Viaarxiv icon

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Add code
Bookmark button
Alert button
May 15, 2023
Zhengxuan Wu, Atticus Geiger, Christopher Potts, Noah D. Goodman

Figure 1 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 2 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 3 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 4 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Viaarxiv icon

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Add code
Bookmark button
Alert button
Mar 05, 2023
Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

Figure 1 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 2 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 3 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 4 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Viaarxiv icon

Causal Abstraction for Faithful Model Interpretation

Add code
Bookmark button
Alert button
Jan 11, 2023
Atticus Geiger, Chris Potts, Thomas Icard

Figure 1 for Causal Abstraction for Faithful Model Interpretation
Figure 2 for Causal Abstraction for Faithful Model Interpretation
Figure 3 for Causal Abstraction for Faithful Model Interpretation
Figure 4 for Causal Abstraction for Faithful Model Interpretation
Viaarxiv icon