Picture for Jonathan Tu

Jonathan Tu

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

Add code
Oct 23, 2023
Figure 1 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Figure 2 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Figure 3 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Figure 4 for Understanding the Inner Workings of Language Models Through Representation Dissimilarity
Viaarxiv icon

Attributing Learned Concepts in Neural Networks to Training Data

Add code
Oct 06, 2023
Figure 1 for Attributing Learned Concepts in Neural Networks to Training Data
Figure 2 for Attributing Learned Concepts in Neural Networks to Training Data
Figure 3 for Attributing Learned Concepts in Neural Networks to Training Data
Figure 4 for Attributing Learned Concepts in Neural Networks to Training Data
Viaarxiv icon

Robustness of edited neural networks

Add code
Feb 28, 2023
Figure 1 for Robustness of edited neural networks
Figure 2 for Robustness of edited neural networks
Figure 3 for Robustness of edited neural networks
Figure 4 for Robustness of edited neural networks
Viaarxiv icon