Alert button
Picture for Jacob Steinhardt

Jacob Steinhardt

Alert button

Approaching Human-Level Forecasting with Language Models

Feb 28, 2024
Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Feb 09, 2024
Alexander Pan, Erik Jones, Meena Jagadeesan, Jacob Steinhardt

Viaarxiv icon

Describing Differences in Image Sets with Natural Language

Dec 05, 2023
Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

Viaarxiv icon

How do Language Models Bind Entities in Context?

Oct 26, 2023
Jiahai Feng, Jacob Steinhardt

Viaarxiv icon

Interpreting CLIP's Image Representation via Text-Based Decomposition

Oct 10, 2023
Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

Figure 1 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 2 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 3 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Figure 4 for Interpreting CLIP's Image Representation via Text-Based Decomposition
Viaarxiv icon

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Jul 18, 2023
Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt

Viaarxiv icon

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Jul 17, 2023
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown

Figure 1 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 2 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 3 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Figure 4 for Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Viaarxiv icon

Jailbroken: How Does LLM Safety Training Fail?

Jul 05, 2023
Alexander Wei, Nika Haghtalab, Jacob Steinhardt

Figure 1 for Jailbroken: How Does LLM Safety Training Fail?
Figure 2 for Jailbroken: How Does LLM Safety Training Fail?
Figure 3 for Jailbroken: How Does LLM Safety Training Fail?
Figure 4 for Jailbroken: How Does LLM Safety Training Fail?
Viaarxiv icon

Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations

Jun 29, 2023
Yongyi Yang, Jacob Steinhardt, Wei Hu

Figure 1 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 2 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 3 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Figure 4 for Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations
Viaarxiv icon

Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition

Jun 26, 2023
Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt, Nika Haghtalab

Figure 1 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Figure 2 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Figure 3 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Figure 4 for Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
Viaarxiv icon