Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Training language models to follow instructions with human feedback



Long Ouyang , Jeff Wu , Xu Jiang , Diogo Almeida , Carroll L. Wainwright , Pamela Mishkin , Chong Zhang , Sandhini Agarwal , Katarina Slama , Alex Ray , John Schulman , Jacob Hilton , Fraser Kelton , Luke Miller , Maddie Simens , Amanda Askell , Peter Welinder , Paul Christiano , Jan Leike , Ryan Lowe


   Access Paper or Ask Questions

WebGPT: Browser-assisted question-answering with human feedback



Reiichiro Nakano , Jacob Hilton , Suchir Balaji , Jeff Wu , Long Ouyang , Christina Kim , Christopher Hesse , Shantanu Jain , Vineet Kosaraju , William Saunders , Xu Jiang , Karl Cobbe , Tyna Eloundou , Gretchen Krueger , Kevin Button , Matthew Knight , Benjamin Chess , John Schulman

* 30 pages 

   Access Paper or Ask Questions

Training Verifiers to Solve Math Word Problems



Karl Cobbe , Vineet Kosaraju , Mohammad Bavarian , Mark Chen , Heewoo Jun , Lukasz Kaiser , Matthias Plappert , Jerry Tworek , Jacob Hilton , Reiichiro Nakano , Christopher Hesse , John Schulman


   Access Paper or Ask Questions

Batch size-invariance for policy optimization



Jacob Hilton , Karl Cobbe , John Schulman

* Submitted to ICLR 2022. 27 pages. Code is available at https://github.com/openai/ppo-ewma 

   Access Paper or Ask Questions

Unsolved Problems in ML Safety



Dan Hendrycks , Nicholas Carlini , John Schulman , Jacob Steinhardt

* Position Paper 

   Access Paper or Ask Questions

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark



Sharada Mohanty , Jyotish Poonganam , Adrien Gaidon , Andrey Kolobov , Blake Wulfe , Dipam Chakraborty , Gražvydas Šemetulskis , João Schapke , Jonas Kubilius , Jurgis Pašukonis , Linas Klimas , Matthew Hausknecht , Patrick MacAlpine , Quang Nhat Tran , Thomas Tumiel , Xiaocheng Tang , Xinwei Chen , Christopher Hesse , Jacob Hilton , William Hebgen Guss , Sahika Genc , John Schulman , Karl Cobbe


   Access Paper or Ask Questions

The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors



William H. Guss , Mario Ynocente Castro , Sam Devlin , Brandon Houghton , Noboru Sean Kuno , Crissman Loomis , Stephanie Milani , Sharada Mohanty , Keisuke Nakata , Ruslan Salakhutdinov , John Schulman , Shinya Shiroshita , Nicholay Topin , Avinash Ummadisingu , Oriol Vinyals

* 37 pages, initial submission, accepted at NeurIPS. arXiv admin note: substantial text overlap with arXiv:1904.10079 

   Access Paper or Ask Questions

Scaling Laws for Autoregressive Generative Modeling



Tom Henighan , Jared Kaplan , Mor Katz , Mark Chen , Christopher Hesse , Jacob Jackson , Heewoo Jun , Tom B. Brown , Prafulla Dhariwal , Scott Gray , Chris Hallacy , Benjamin Mann , Alec Radford , Aditya Ramesh , Nick Ryder , Daniel M. Ziegler , John Schulman , Dario Amodei , Sam McCandlish

* 20+17 pages, 33 figures; added appendix with additional language results 

   Access Paper or Ask Questions

1
2
3
4
>>