Alert button
Picture for Ethan Perez

Ethan Perez

Alert button

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Add code
Bookmark button
Alert button
Oct 19, 2023
Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner

Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Bookmark button
Alert button
Aug 07, 2023
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Add code
Bookmark button
Alert button
Jul 25, 2023
Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez

Figure 1 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 2 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 3 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 4 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Viaarxiv icon

Measuring Faithfulness in Chain-of-Thought Reasoning

Add code
Bookmark button
Alert button
Jul 17, 2023
Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez

Figure 1 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 2 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 3 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 4 for Measuring Faithfulness in Chain-of-Thought Reasoning
Viaarxiv icon

Inverse Scaling: When Bigger Isn't Better

Add code
Bookmark button
Alert button
Jun 15, 2023
Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim, Samuel R. Bowman, Ethan Perez

Figure 1 for Inverse Scaling: When Bigger Isn't Better
Figure 2 for Inverse Scaling: When Bigger Isn't Better
Figure 3 for Inverse Scaling: When Bigger Isn't Better
Figure 4 for Inverse Scaling: When Bigger Isn't Better
Viaarxiv icon

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Add code
Bookmark button
Alert button
May 07, 2023
Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

Figure 1 for Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Figure 2 for Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Figure 3 for Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Figure 4 for Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Viaarxiv icon

Training Language Models with Language Feedback at Scale

Add code
Bookmark button
Alert button
Apr 09, 2023
Jérémy Scheurer, Jon Ander Campos, Tomasz Korbak, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez

Figure 1 for Training Language Models with Language Feedback at Scale
Figure 2 for Training Language Models with Language Feedback at Scale
Figure 3 for Training Language Models with Language Feedback at Scale
Figure 4 for Training Language Models with Language Feedback at Scale
Viaarxiv icon

Improving Code Generation by Training with Natural Language Feedback

Add code
Bookmark button
Alert button
Mar 28, 2023
Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, Ethan Perez

Figure 1 for Improving Code Generation by Training with Natural Language Feedback
Figure 2 for Improving Code Generation by Training with Natural Language Feedback
Figure 3 for Improving Code Generation by Training with Natural Language Feedback
Figure 4 for Improving Code Generation by Training with Natural Language Feedback
Viaarxiv icon