Alert button
Picture for John Schulman

John Schulman

Alert button

Let's Verify Step by Step

May 31, 2023
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

Figure 1 for Let's Verify Step by Step
Figure 2 for Let's Verify Step by Step
Figure 3 for Let's Verify Step by Step
Figure 4 for Let's Verify Step by Step
Viaarxiv icon

Scaling laws for single-agent reinforcement learning

Jan 31, 2023
Jacob Hilton, Jie Tang, John Schulman

Figure 1 for Scaling laws for single-agent reinforcement learning
Figure 2 for Scaling laws for single-agent reinforcement learning
Figure 3 for Scaling laws for single-agent reinforcement learning
Figure 4 for Scaling laws for single-agent reinforcement learning
Viaarxiv icon

Scaling Laws for Reward Model Overoptimization

Oct 19, 2022
Leo Gao, John Schulman, Jacob Hilton

Figure 1 for Scaling Laws for Reward Model Overoptimization
Figure 2 for Scaling Laws for Reward Model Overoptimization
Figure 3 for Scaling Laws for Reward Model Overoptimization
Figure 4 for Scaling Laws for Reward Model Overoptimization
Viaarxiv icon

Efficient Training of Language Models to Fill in the Middle

Jul 28, 2022
Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, Mark Chen

Figure 1 for Efficient Training of Language Models to Fill in the Middle
Figure 2 for Efficient Training of Language Models to Fill in the Middle
Figure 3 for Efficient Training of Language Models to Fill in the Middle
Figure 4 for Efficient Training of Language Models to Fill in the Middle
Viaarxiv icon

Training language models to follow instructions with human feedback

Mar 04, 2022
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Figure 1 for Training language models to follow instructions with human feedback
Figure 2 for Training language models to follow instructions with human feedback
Figure 3 for Training language models to follow instructions with human feedback
Figure 4 for Training language models to follow instructions with human feedback
Viaarxiv icon

WebGPT: Browser-assisted question-answering with human feedback

Dec 17, 2021
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

Figure 1 for WebGPT: Browser-assisted question-answering with human feedback
Figure 2 for WebGPT: Browser-assisted question-answering with human feedback
Figure 3 for WebGPT: Browser-assisted question-answering with human feedback
Figure 4 for WebGPT: Browser-assisted question-answering with human feedback
Viaarxiv icon

Training Verifiers to Solve Math Word Problems

Nov 18, 2021
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

Figure 1 for Training Verifiers to Solve Math Word Problems
Figure 2 for Training Verifiers to Solve Math Word Problems
Figure 3 for Training Verifiers to Solve Math Word Problems
Figure 4 for Training Verifiers to Solve Math Word Problems
Viaarxiv icon

Batch size-invariance for policy optimization

Oct 01, 2021
Jacob Hilton, Karl Cobbe, John Schulman

Figure 1 for Batch size-invariance for policy optimization
Figure 2 for Batch size-invariance for policy optimization
Figure 3 for Batch size-invariance for policy optimization
Figure 4 for Batch size-invariance for policy optimization
Viaarxiv icon

Unsolved Problems in ML Safety

Sep 28, 2021
Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt

Figure 1 for Unsolved Problems in ML Safety
Figure 2 for Unsolved Problems in ML Safety
Figure 3 for Unsolved Problems in ML Safety
Figure 4 for Unsolved Problems in ML Safety
Viaarxiv icon