Alert button
Picture for Zhengxuan Wu

Zhengxuan Wu

Alert button

ReFT: Representation Finetuning for Language Models

Add code
Bookmark button
Alert button
Apr 08, 2024
Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

Viaarxiv icon

Mapping the Increasing Use of LLMs in Scientific Papers

Add code
Bookmark button
Alert button
Apr 01, 2024
Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Bookmark button
Alert button
Mar 12, 2024
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Figure 1 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 2 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 3 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Viaarxiv icon

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Add code
Bookmark button
Alert button
Mar 12, 2024
Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

Figure 1 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 2 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 3 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 4 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Add code
Bookmark button
Alert button
Feb 27, 2024
Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

Viaarxiv icon

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Add code
Bookmark button
Alert button
Jan 23, 2024
Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Add code
Bookmark button
Alert button
Sep 19, 2023
Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Viaarxiv icon

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

Add code
Bookmark button
Alert button
May 24, 2023
Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen

Figure 1 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 2 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 3 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 4 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Viaarxiv icon

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Add code
Bookmark button
Alert button
May 15, 2023
Zhengxuan Wu, Atticus Geiger, Christopher Potts, Noah D. Goodman

Figure 1 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 2 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 3 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 4 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Viaarxiv icon