Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Picture for Jacob Hilton

Training language models to follow instructions with human feedback


Mar 04, 2022
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe


  Access Paper or Ask Questions

WebGPT: Browser-assisted question-answering with human feedback


Dec 17, 2021
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

* 30 pages 

  Access Paper or Ask Questions

Training Verifiers to Solve Math Word Problems


Nov 18, 2021
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman


  Access Paper or Ask Questions

Batch size-invariance for policy optimization


Oct 01, 2021
Jacob Hilton, Karl Cobbe, John Schulman

* Submitted to ICLR 2022. 27 pages. Code is available at https://github.com/openai/ppo-ewma 

  Access Paper or Ask Questions

TruthfulQA: Measuring How Models Mimic Human Falsehoods


Sep 08, 2021
Stephanie Lin, Jacob Hilton, Owain Evans

* The TruthfulQA benchmark and evaluation code is available at https://github.com/sylinrl/TruthfulQA 

  Access Paper or Ask Questions

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark


Mar 29, 2021
Sharada Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Gražvydas Šemetulskis, João Schapke, Jonas Kubilius, Jurgis Pašukonis, Linas Klimas, Matthew Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe


  Access Paper or Ask Questions

Phasic Policy Gradient


Sep 09, 2020
Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman


  Access Paper or Ask Questions

Leveraging Procedural Generation to Benchmark Reinforcement Learning


Dec 03, 2019
Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman


  Access Paper or Ask Questions