Alert button
Picture for Cem Anil

Cem Anil

Alert button

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Jan 17, 2024
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Aug 07, 2023
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

Nov 18, 2022
Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse

Figure 1 for Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Figure 2 for Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Figure 3 for Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Figure 4 for Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
Viaarxiv icon

Exploring Length Generalization in Large Language Models

Jul 11, 2022
Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Figure 1 for Exploring Length Generalization in Large Language Models
Figure 2 for Exploring Length Generalization in Large Language Models
Figure 3 for Exploring Length Generalization in Large Language Models
Figure 4 for Exploring Length Generalization in Large Language Models
Viaarxiv icon

Solving Quantitative Reasoning Problems with Language Models

Jul 01, 2022
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Figure 1 for Solving Quantitative Reasoning Problems with Language Models
Figure 2 for Solving Quantitative Reasoning Problems with Language Models
Figure 3 for Solving Quantitative Reasoning Problems with Language Models
Figure 4 for Solving Quantitative Reasoning Problems with Language Models
Viaarxiv icon

Learning to Give Checkable Answers with Prover-Verifier Games

Aug 27, 2021
Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse

Figure 1 for Learning to Give Checkable Answers with Prover-Verifier Games
Figure 2 for Learning to Give Checkable Answers with Prover-Verifier Games
Figure 3 for Learning to Give Checkable Answers with Prover-Verifier Games
Figure 4 for Learning to Give Checkable Answers with Prover-Verifier Games
Viaarxiv icon

Learning to Elect

Aug 07, 2021
Cem Anil, Xuchan Bao

Figure 1 for Learning to Elect
Figure 2 for Learning to Elect
Figure 3 for Learning to Elect
Figure 4 for Learning to Elect
Viaarxiv icon