Alert button
Picture for Jérémy Scheurer

Jérémy Scheurer

Alert button

Black-Box Access is Insufficient for Rigorous AI Audits

Jan 25, 2024
Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

Viaarxiv icon

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Nov 27, 2023
Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn

Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Jul 27, 2023
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Training Language Models with Language Feedback at Scale

Apr 09, 2023
Jérémy Scheurer, Jon Ander Campos, Tomasz Korbak, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

Figure 1 for Training Language Models with Language Feedback at Scale
Figure 2 for Training Language Models with Language Feedback at Scale
Figure 3 for Training Language Models with Language Feedback at Scale
Figure 4 for Training Language Models with Language Feedback at Scale
Viaarxiv icon

Improving Code Generation by Training with Natural Language Feedback

Mar 28, 2023
Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

Figure 1 for Improving Code Generation by Training with Natural Language Feedback
Figure 2 for Improving Code Generation by Training with Natural Language Feedback
Figure 3 for Improving Code Generation by Training with Natural Language Feedback
Figure 4 for Improving Code Generation by Training with Natural Language Feedback
Viaarxiv icon

Few-shot Adaptation Works with UnpredicTable Data

Aug 08, 2022
Jun Shern Chan, Michael Pieler, Jonathan Jao, Jérémy Scheurer, Ethan Perez

Figure 1 for Few-shot Adaptation Works with UnpredicTable Data
Figure 2 for Few-shot Adaptation Works with UnpredicTable Data
Figure 3 for Few-shot Adaptation Works with UnpredicTable Data
Figure 4 for Few-shot Adaptation Works with UnpredicTable Data
Viaarxiv icon

Training Language Models with Natural Language Feedback

May 02, 2022
Jérémy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

Figure 1 for Training Language Models with Natural Language Feedback
Figure 2 for Training Language Models with Natural Language Feedback
Figure 3 for Training Language Models with Natural Language Feedback
Figure 4 for Training Language Models with Natural Language Feedback
Viaarxiv icon

Learning from Natural Language Feedback

Apr 29, 2022
Jérémy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

Figure 1 for Learning from Natural Language Feedback
Figure 2 for Learning from Natural Language Feedback
Figure 3 for Learning from Natural Language Feedback
Figure 4 for Learning from Natural Language Feedback
Viaarxiv icon