Picture for Jacob Andreas

Jacob Andreas

The Consensus Game: Language Model Generation via Equilibrium Search

Add code
Oct 13, 2023
Figure 1 for The Consensus Game: Language Model Generation via Equilibrium Search
Figure 2 for The Consensus Game: Language Model Generation via Equilibrium Search
Figure 3 for The Consensus Game: Language Model Generation via Equilibrium Search
Figure 4 for The Consensus Game: Language Model Generation via Equilibrium Search
Viaarxiv icon

A Function Interpretation Benchmark for Evaluating Interpretability Methods

Add code
Sep 07, 2023
Figure 1 for A Function Interpretation Benchmark for Evaluating Interpretability Methods
Figure 2 for A Function Interpretation Benchmark for Evaluating Interpretability Methods
Figure 3 for A Function Interpretation Benchmark for Evaluating Interpretability Methods
Figure 4 for A Function Interpretation Benchmark for Evaluating Interpretability Methods
Viaarxiv icon

Linearity of Relation Decoding in Transformer Language Models

Add code
Aug 17, 2023
Viaarxiv icon

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

Add code
Aug 01, 2023
Viaarxiv icon

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Add code
Jun 30, 2023
Viaarxiv icon

From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

Add code
Jun 23, 2023
Figure 1 for From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Figure 2 for From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Figure 3 for From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Figure 4 for From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
Viaarxiv icon

Decision-Oriented Dialogue for Human-AI Collaboration

Add code
Jun 01, 2023
Viaarxiv icon

Grokking of Hierarchical Structure in Vanilla Transformers

Add code
May 30, 2023
Viaarxiv icon

Natural Language Decomposition and Interpretation of Complex Utterances

Add code
May 15, 2023
Viaarxiv icon

Measuring and Manipulating Knowledge Representations in Language Models

Add code
Apr 03, 2023
Figure 1 for Measuring and Manipulating Knowledge Representations in Language Models
Figure 2 for Measuring and Manipulating Knowledge Representations in Language Models
Figure 3 for Measuring and Manipulating Knowledge Representations in Language Models
Figure 4 for Measuring and Manipulating Knowledge Representations in Language Models
Viaarxiv icon