Alert button
Picture for David Duvenaud

David Duvenaud

Alert button

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Feb 13, 2024
Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison

Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Jan 17, 2024
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

Viaarxiv icon

Sorting Out Quantum Monte Carlo

Nov 09, 2023
Jack Richter-Powell, Luca Thiede, Alán Asparu-Guzik, David Duvenaud

Viaarxiv icon

Towards Understanding Sycophancy in Language Models

Oct 27, 2023
Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

Figure 1 for Towards Understanding Sycophancy in Language Models
Figure 2 for Towards Understanding Sycophancy in Language Models
Figure 3 for Towards Understanding Sycophancy in Language Models
Figure 4 for Towards Understanding Sycophancy in Language Models
Viaarxiv icon

Tools for Verifying Neural Models' Training Data

Jul 02, 2023
Dami Choi, Yonadav Shavit, David Duvenaud

Figure 1 for Tools for Verifying Neural Models' Training Data
Figure 2 for Tools for Verifying Neural Models' Training Data
Figure 3 for Tools for Verifying Neural Models' Training Data
Figure 4 for Tools for Verifying Neural Models' Training Data
Viaarxiv icon

On Implicit Bias in Overparameterized Bilevel Optimization

Dec 28, 2022
Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, Roger Grosse

Figure 1 for On Implicit Bias in Overparameterized Bilevel Optimization
Figure 2 for On Implicit Bias in Overparameterized Bilevel Optimization
Figure 3 for On Implicit Bias in Overparameterized Bilevel Optimization
Figure 4 for On Implicit Bias in Overparameterized Bilevel Optimization
Viaarxiv icon

Meta-Learning to Improve Pre-Training

Nov 02, 2021
Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith, Matthew McDermott, David Duvenaud

Figure 1 for Meta-Learning to Improve Pre-Training
Figure 2 for Meta-Learning to Improve Pre-Training
Viaarxiv icon