Alert button
Picture for Zhengxuan Wu

Zhengxuan Wu

Alert button

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Mar 12, 2024
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Viaarxiv icon

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Mar 12, 2024
Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

Figure 1 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 2 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 3 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 4 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Feb 27, 2024
Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

Viaarxiv icon

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Jan 23, 2024
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Sep 19, 2023
Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Viaarxiv icon

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

May 24, 2023
Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen

Figure 1 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 2 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 3 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 4 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Viaarxiv icon

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

May 15, 2023
Zhengxuan Wu, Atticus Geiger, Christopher Potts, Noah D. Goodman

Figure 1 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 2 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 3 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 4 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Viaarxiv icon

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation

Mar 24, 2023
Zhengxuan Wu, Christopher D. Manning, Christopher Potts

Figure 1 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Figure 2 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Figure 3 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Figure 4 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Viaarxiv icon

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Mar 05, 2023
Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

Figure 1 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 2 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 3 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Figure 4 for Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Viaarxiv icon