Alert button
Picture for Roger Grosse

Roger Grosse

Alert button

REFACTOR: Learning to Extract Theorems from Proofs

Add code
Bookmark button
Alert button
Feb 26, 2024
Jin Peng Zhou, Yuhuai Wu, Qiyang Li, Roger Grosse

Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Bookmark button
Alert button
Jan 17, 2024
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Bookmark button
Alert button
Aug 07, 2023
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Add code
Bookmark button
Alert button
Mar 13, 2023
Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani

Figure 1 for Improving Mutual Information Estimation with Annealed and Energy-Based Bounds
Figure 2 for Improving Mutual Information Estimation with Annealed and Energy-Based Bounds
Figure 3 for Improving Mutual Information Estimation with Annealed and Energy-Based Bounds
Figure 4 for Improving Mutual Information Estimation with Annealed and Energy-Based Bounds
Viaarxiv icon

Efficient Parametric Approximations of Neural Network Function Space Distance

Add code
Bookmark button
Alert button
Feb 07, 2023
Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse

Figure 1 for Efficient Parametric Approximations of Neural Network Function Space Distance
Figure 2 for Efficient Parametric Approximations of Neural Network Function Space Distance
Figure 3 for Efficient Parametric Approximations of Neural Network Function Space Distance
Figure 4 for Efficient Parametric Approximations of Neural Network Function Space Distance
Viaarxiv icon

On Implicit Bias in Overparameterized Bilevel Optimization

Add code
Bookmark button
Alert button
Dec 28, 2022
Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, Roger Grosse

Figure 1 for On Implicit Bias in Overparameterized Bilevel Optimization
Figure 2 for On Implicit Bias in Overparameterized Bilevel Optimization
Figure 3 for On Implicit Bias in Overparameterized Bilevel Optimization
Figure 4 for On Implicit Bias in Overparameterized Bilevel Optimization
Viaarxiv icon

Discovering Language Model Behaviors with Model-Written Evaluations

Add code
Bookmark button
Alert button
Dec 19, 2022
Ethan Perez, Sam Ringer, Kamilė Lukošiūtė, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Ben Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Kernion, James Landis, Jamie Kerr, Jared Mueller, Jeeyoon Hyun, Joshua Landau, Kamal Ndousse, Landon Goldberg, Liane Lovitt, Martin Lucas, Michael Sellitto, Miranda Zhang, Neerav Kingsland, Nelson Elhage, Nicholas Joseph, Noemí Mercado, Nova DasSarma, Oliver Rausch, Robin Larson, Sam McCandlish, Scott Johnston, Shauna Kravec, Sheer El Showk, Tamera Lanham, Timothy Telleen-Lawton, Tom Brown, Tom Henighan, Tristan Hume, Yuntao Bai, Zac Hatfield-Dodds, Jack Clark, Samuel R. Bowman, Amanda Askell, Roger Grosse, Danny Hernandez, Deep Ganguli, Evan Hubinger, Nicholas Schiefer, Jared Kaplan

Figure 1 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 2 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 3 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 4 for Discovering Language Model Behaviors with Model-Written Evaluations
Viaarxiv icon

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

Add code
Bookmark button
Alert button
Dec 07, 2022
Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse

Figure 1 for Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
Figure 2 for Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
Figure 3 for Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
Figure 4 for Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
Viaarxiv icon

Similarity-based Cooperation

Add code
Bookmark button
Alert button
Nov 26, 2022
Caspar Oesterheld, Johannes Treutlein, Roger Grosse, Vincent Conitzer, Jakob Foerster

Figure 1 for Similarity-based Cooperation
Figure 2 for Similarity-based Cooperation
Figure 3 for Similarity-based Cooperation
Figure 4 for Similarity-based Cooperation
Viaarxiv icon