Picture for Jacob Steinhardt

Jacob Steinhardt

Approaching Human-Level Forecasting with Language Models

Add code
Feb 28, 2024
Figure 1 for Approaching Human-Level Forecasting with Language Models
Figure 2 for Approaching Human-Level Forecasting with Language Models
Figure 3 for Approaching Human-Level Forecasting with Language Models
Figure 4 for Approaching Human-Level Forecasting with Language Models
Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Add code
Feb 09, 2024
Figure 1 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 2 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 3 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 4 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Viaarxiv icon

Describing Differences in Image Sets with Natural Language

Add code
Dec 05, 2023
Figure 1 for Describing Differences in Image Sets with Natural Language
Figure 2 for Describing Differences in Image Sets with Natural Language
Figure 3 for Describing Differences in Image Sets with Natural Language
Figure 4 for Describing Differences in Image Sets with Natural Language
Viaarxiv icon

How do Language Models Bind Entities in Context?

Add code
Oct 26, 2023
Viaarxiv icon

Interpreting CLIP's Image Representation via Text-Based Decomposition

Add code
Oct 10, 2023
Figure 1 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 2 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 3 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 4 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Viaarxiv icon

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Add code
Jul 18, 2023
Figure 1 for Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Figure 2 for Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Figure 3 for Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Figure 4 for Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Viaarxiv icon

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Add code
Jul 17, 2023
Viaarxiv icon

Jailbroken: How Does LLM Safety Training Fail?

Add code
Jul 05, 2023
Viaarxiv icon

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Add code
Jun 29, 2023
Figure 1 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 2 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 3 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 4 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Viaarxiv icon

Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition

Add code
Jun 26, 2023
Viaarxiv icon