Picture for Karl Cobbe

Karl Cobbe

Let's Verify Step by Step

Add code
May 31, 2023
Figure 1 for Let's Verify Step by Step
Figure 2 for Let's Verify Step by Step
Figure 3 for Let's Verify Step by Step
Figure 4 for Let's Verify Step by Step
Viaarxiv icon

WebGPT: Browser-assisted question-answering with human feedback

Add code
Dec 17, 2021
Figure 1 for WebGPT: Browser-assisted question-answering with human feedback
Figure 2 for WebGPT: Browser-assisted question-answering with human feedback
Figure 3 for WebGPT: Browser-assisted question-answering with human feedback
Figure 4 for WebGPT: Browser-assisted question-answering with human feedback
Viaarxiv icon

Training Verifiers to Solve Math Word Problems

Add code
Nov 18, 2021
Figure 1 for Training Verifiers to Solve Math Word Problems
Figure 2 for Training Verifiers to Solve Math Word Problems
Figure 3 for Training Verifiers to Solve Math Word Problems
Figure 4 for Training Verifiers to Solve Math Word Problems
Viaarxiv icon

Batch size-invariance for policy optimization

Add code
Oct 01, 2021
Figure 1 for Batch size-invariance for policy optimization
Figure 2 for Batch size-invariance for policy optimization
Figure 3 for Batch size-invariance for policy optimization
Figure 4 for Batch size-invariance for policy optimization
Viaarxiv icon

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

Add code
Mar 29, 2021
Figure 1 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Figure 2 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Figure 3 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Figure 4 for Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Viaarxiv icon

Phasic Policy Gradient

Add code
Sep 09, 2020
Figure 1 for Phasic Policy Gradient
Figure 2 for Phasic Policy Gradient
Figure 3 for Phasic Policy Gradient
Figure 4 for Phasic Policy Gradient
Viaarxiv icon

Leveraging Procedural Generation to Benchmark Reinforcement Learning

Add code
Dec 03, 2019
Figure 1 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Figure 2 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Figure 3 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Figure 4 for Leveraging Procedural Generation to Benchmark Reinforcement Learning
Viaarxiv icon

Quantifying Generalization in Reinforcement Learning

Add code
Dec 20, 2018
Figure 1 for Quantifying Generalization in Reinforcement Learning
Figure 2 for Quantifying Generalization in Reinforcement Learning
Figure 3 for Quantifying Generalization in Reinforcement Learning
Figure 4 for Quantifying Generalization in Reinforcement Learning
Viaarxiv icon