Alert button
Picture for Jesse Mu

Jesse Mu

Alert button

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Bookmark button
Alert button
Jan 17, 2024
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

Viaarxiv icon

Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems

Add code
Bookmark button
Alert button
May 19, 2023
Dhara Yu, Noah D. Goodman, Jesse Mu

Figure 1 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Figure 2 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Figure 3 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Figure 4 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Viaarxiv icon

Learning to Compress Prompts with Gist Tokens

Add code
Bookmark button
Alert button
Apr 17, 2023
Jesse Mu, Xiang Lisa Li, Noah Goodman

Figure 1 for Learning to Compress Prompts with Gist Tokens
Figure 2 for Learning to Compress Prompts with Gist Tokens
Figure 3 for Learning to Compress Prompts with Gist Tokens
Figure 4 for Learning to Compress Prompts with Gist Tokens
Viaarxiv icon

Improving Policy Learning via Language Dynamics Distillation

Add code
Bookmark button
Alert button
Sep 30, 2022
Victor Zhong, Jesse Mu, Luke Zettlemoyer, Edward Grefenstette, Tim Rocktäschel

Figure 1 for Improving Policy Learning via Language Dynamics Distillation
Figure 2 for Improving Policy Learning via Language Dynamics Distillation
Figure 3 for Improving Policy Learning via Language Dynamics Distillation
Figure 4 for Improving Policy Learning via Language Dynamics Distillation
Viaarxiv icon

Active Learning Helps Pretrained Models Learn the Intended Task

Add code
Bookmark button
Alert button
Apr 18, 2022
Alex Tamkin, Dat Nguyen, Salil Deshpande, Jesse Mu, Noah Goodman

Figure 1 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 2 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 3 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 4 for Active Learning Helps Pretrained Models Learn the Intended Task
Viaarxiv icon

Improving Intrinsic Exploration with Language Abstractions

Add code
Bookmark button
Alert button
Feb 17, 2022
Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette

Figure 1 for Improving Intrinsic Exploration with Language Abstractions
Figure 2 for Improving Intrinsic Exploration with Language Abstractions
Figure 3 for Improving Intrinsic Exploration with Language Abstractions
Figure 4 for Improving Intrinsic Exploration with Language Abstractions
Viaarxiv icon

Calibrate your listeners! Robust communication-based training for pragmatic speakers

Add code
Bookmark button
Alert button
Oct 11, 2021
Rose E. Wang, Julia White, Jesse Mu, Noah D. Goodman

Figure 1 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 2 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 3 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 4 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Viaarxiv icon

Emergent Communication of Generalizations

Add code
Bookmark button
Alert button
Jun 04, 2021
Jesse Mu, Noah Goodman

Figure 1 for Emergent Communication of Generalizations
Figure 2 for Emergent Communication of Generalizations
Figure 3 for Emergent Communication of Generalizations
Figure 4 for Emergent Communication of Generalizations
Viaarxiv icon

Compositional Explanations of Neurons

Add code
Bookmark button
Alert button
Jun 24, 2020
Jesse Mu, Jacob Andreas

Figure 1 for Compositional Explanations of Neurons
Figure 2 for Compositional Explanations of Neurons
Figure 3 for Compositional Explanations of Neurons
Figure 4 for Compositional Explanations of Neurons
Viaarxiv icon