Alert button
Picture for Jacob Steinhardt

Jacob Steinhardt

Alert button

Approaching Human-Level Forecasting with Language Models

Add code
Bookmark button
Alert button
Feb 28, 2024
Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Add code
Bookmark button
Alert button
Feb 09, 2024
Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt

Viaarxiv icon

Describing Differences in Image Sets with Natural Language

Add code
Bookmark button
Alert button
Dec 05, 2023
Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

Viaarxiv icon

How do Language Models Bind Entities in Context?

Add code
Bookmark button
Alert button
Oct 26, 2023
Jiahai Feng, Jacob Steinhardt

Viaarxiv icon

Interpreting CLIP's Image Representation via Text-Based Decomposition

Add code
Bookmark button
Alert button
Oct 10, 2023
Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

Figure 1 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 2 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 3 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 4 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Viaarxiv icon

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Add code
Bookmark button
Alert button
Jul 18, 2023
Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

Viaarxiv icon

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Add code
Bookmark button
Alert button
Jul 17, 2023
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

Figure 1 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 2 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 3 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 4 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Viaarxiv icon

Jailbroken: How Does LLM Safety Training Fail?

Add code
Bookmark button
Alert button
Jul 05, 2023
Alexander Wei, Nika Haghtalab, Jacob Steinhardt

Figure 1 for Jailbroken: How Does LLM Safety Training Fail?
Figure 2 for Jailbroken: How Does LLM Safety Training Fail?
Figure 3 for Jailbroken: How Does LLM Safety Training Fail?
Figure 4 for Jailbroken: How Does LLM Safety Training Fail?
Viaarxiv icon

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Add code
Bookmark button
Alert button
Jun 29, 2023
Yongyi Yang, Jacob Steinhardt, Wei Hu

Figure 1 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 2 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 3 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 4 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Viaarxiv icon

Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition

Add code
Bookmark button
Alert button
Jun 26, 2023
Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt, Nika Haghtalab

Figure 1 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Figure 2 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Figure 3 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Figure 4 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Viaarxiv icon