Alert button
Picture for Jacob Steinhardt

Jacob Steinhardt

Alert button

Mass-Producing Failures of Multimodal Systems with Language Models

Add code
Bookmark button
Alert button
Jun 21, 2023
Shengbang Tong, Erik Jones, Jacob Steinhardt

Figure 1 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 2 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 3 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 4 for Mass-Producing Failures of Multimodal Systems with Language Models
Viaarxiv icon

Incentivizing High-Quality Content in Online Recommender Systems

Add code
Bookmark button
Alert button
Jun 14, 2023
Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt

Figure 1 for Incentivizing High-Quality Content in Online Recommender Systems
Viaarxiv icon

Eliciting Latent Predictions from Transformers with the Tuned Lens

Add code
Bookmark button
Alert button
Mar 15, 2023
Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

Figure 1 for Eliciting Latent Predictions from Transformers with the Tuned Lens
Figure 2 for Eliciting Latent Predictions from Transformers with the Tuned Lens
Figure 3 for Eliciting Latent Predictions from Transformers with the Tuned Lens
Figure 4 for Eliciting Latent Predictions from Transformers with the Tuned Lens
Viaarxiv icon

Automatically Auditing Large Language Models via Discrete Optimization

Add code
Bookmark button
Alert button
Mar 08, 2023
Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob Steinhardt

Figure 1 for Automatically Auditing Large Language Models via Discrete Optimization
Figure 2 for Automatically Auditing Large Language Models via Discrete Optimization
Figure 3 for Automatically Auditing Large Language Models via Discrete Optimization
Figure 4 for Automatically Auditing Large Language Models via Discrete Optimization
Viaarxiv icon

Goal Driven Discovery of Distributional Differences via Language Descriptions

Add code
Bookmark button
Alert button
Feb 28, 2023
Ruiqi Zhong, Peter Zhang, Steve Li, Jinwoo Ahn, Dan Klein, Jacob Steinhardt

Figure 1 for Goal Driven Discovery of Distributional Differences via Language Descriptions
Figure 2 for Goal Driven Discovery of Distributional Differences via Language Descriptions
Figure 3 for Goal Driven Discovery of Distributional Differences via Language Descriptions
Figure 4 for Goal Driven Discovery of Distributional Differences via Language Descriptions
Viaarxiv icon

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

Add code
Bookmark button
Alert button
Feb 23, 2023
Kush Bhatia, Wenshuo Guo, Jacob Steinhardt

Figure 1 for Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws
Figure 2 for Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws
Viaarxiv icon

Progress measures for grokking via mechanistic interpretability

Add code
Bookmark button
Alert button
Jan 13, 2023
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt

Figure 1 for Progress measures for grokking via mechanistic interpretability
Figure 2 for Progress measures for grokking via mechanistic interpretability
Figure 3 for Progress measures for grokking via mechanistic interpretability
Figure 4 for Progress measures for grokking via mechanistic interpretability
Viaarxiv icon

Discovering Latent Knowledge in Language Models Without Supervision

Add code
Bookmark button
Alert button
Dec 07, 2022
Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

Figure 1 for Discovering Latent Knowledge in Language Models Without Supervision
Figure 2 for Discovering Latent Knowledge in Language Models Without Supervision
Figure 3 for Discovering Latent Knowledge in Language Models Without Supervision
Figure 4 for Discovering Latent Knowledge in Language Models Without Supervision
Viaarxiv icon

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Add code
Bookmark button
Alert button
Nov 01, 2022
Kevin Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt

Figure 1 for Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Figure 2 for Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Figure 3 for Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Figure 4 for Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Viaarxiv icon