Picture for Zhengxuan Wu

Zhengxuan Wu

ReFT: Representation Finetuning for Language Models

Add code
Apr 08, 2024
Viaarxiv icon

Mapping the Increasing Use of LLMs in Scientific Papers

Add code
Apr 01, 2024
Figure 1 for Mapping the Increasing Use of LLMs in Scientific Papers
Figure 2 for Mapping the Increasing Use of LLMs in Scientific Papers
Figure 3 for Mapping the Increasing Use of LLMs in Scientific Papers
Figure 4 for Mapping the Increasing Use of LLMs in Scientific Papers
Viaarxiv icon

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Add code
Mar 12, 2024
Figure 1 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 2 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 3 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Figure 4 for In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Mar 12, 2024
Figure 1 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 2 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 3 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Add code
Feb 27, 2024
Viaarxiv icon

A Reply to Makelov et al. 's "Interpretability Illusion" Arguments

Add code
Jan 23, 2024
Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Add code
Sep 19, 2023
Figure 1 for Rigorously Assessing Natural Language Explanations of Neurons
Figure 2 for Rigorously Assessing Natural Language Explanations of Neurons
Figure 3 for Rigorously Assessing Natural Language Explanations of Neurons
Figure 4 for Rigorously Assessing Natural Language Explanations of Neurons
Viaarxiv icon

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

Add code
May 24, 2023
Figure 1 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 2 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 3 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Figure 4 for MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Viaarxiv icon

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Add code
May 15, 2023
Figure 1 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 2 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 3 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Figure 4 for Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Viaarxiv icon

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation

Add code
Mar 24, 2023
Figure 1 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Figure 2 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Figure 3 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Figure 4 for ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation
Viaarxiv icon