Alert button
Picture for Karl Cobbe

Karl Cobbe

Alert button

Let's Verify Step by Step

May 31, 2023
Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

Figure 1 for Let's Verify Step by Step
Figure 2 for Let's Verify Step by Step
Figure 3 for Let's Verify Step by Step
Figure 4 for Let's Verify Step by Step
Viaarxiv icon

WebGPT: Browser-assisted question-answering with human feedback

Dec 17, 2021
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

Figure 1 for WebGPT: Browser-assisted question-answering with human feedback
Figure 2 for WebGPT: Browser-assisted question-answering with human feedback
Figure 3 for WebGPT: Browser-assisted question-answering with human feedback
Figure 4 for WebGPT: Browser-assisted question-answering with human feedback
Viaarxiv icon

Training Verifiers to Solve Math Word Problems

Nov 18, 2021
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

Figure 1 for Training Verifiers to Solve Math Word Problems
Figure 2 for Training Verifiers to Solve Math Word Problems
Figure 3 for Training Verifiers to Solve Math Word Problems
Figure 4 for Training Verifiers to Solve Math Word Problems
Viaarxiv icon

Batch size-invariance for policy optimization

Oct 01, 2021
Jacob Hilton, Karl Cobbe, John Schulman

Figure 1 for Batch size-invariance for policy optimization
Figure 2 for Batch size-invariance for policy optimization
Figure 3 for Batch size-invariance for policy optimization
Figure 4 for Batch size-invariance for policy optimization
Viaarxiv icon

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

Mar 29, 2021
Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Gražvydas Šemetulskis, João Schapke, Jonas Kubilius, Jurgis Pašukonis, Linas Klimas, Matthew Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe

Figure 1 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Figure 2 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Figure 3 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Figure 4 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Viaarxiv icon

Phasic Policy Gradient

Sep 09, 2020
Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

Figure 1 for Phasic Policy Gradient
Figure 2 for Phasic Policy Gradient
Figure 3 for Phasic Policy Gradient
Figure 4 for Phasic Policy Gradient
Viaarxiv icon

Leveraging Procedural Generation to Benchmark Reinforcement Learning

Dec 03, 2019
Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman

Figure 1 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Figure 2 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Figure 3 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Figure 4 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Viaarxiv icon

Quantifying Generalization in Reinforcement Learning

Dec 20, 2018
Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, John Schulman

Figure 1 for Quantifying Generalization in Reinforcement Learning
Figure 2 for Quantifying Generalization in Reinforcement Learning
Figure 3 for Quantifying Generalization in Reinforcement Learning
Figure 4 for Quantifying Generalization in Reinforcement Learning
Viaarxiv icon