Picture for Jesse Mu

Jesse Mu

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems

Add code
May 19, 2023
Figure 1 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Figure 2 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Figure 3 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Figure 4 for Characterizing tradeoffs between teaching via language and demonstrations in multi-agent systems
Viaarxiv icon

Learning to Compress Prompts with Gist Tokens

Add code
Apr 17, 2023
Figure 1 for Learning to Compress Prompts with Gist Tokens
Figure 2 for Learning to Compress Prompts with Gist Tokens
Figure 3 for Learning to Compress Prompts with Gist Tokens
Figure 4 for Learning to Compress Prompts with Gist Tokens
Viaarxiv icon

Improving Policy Learning via Language Dynamics Distillation

Add code
Sep 30, 2022
Figure 1 for Improving Policy Learning via Language Dynamics Distillation
Figure 2 for Improving Policy Learning via Language Dynamics Distillation
Figure 3 for Improving Policy Learning via Language Dynamics Distillation
Figure 4 for Improving Policy Learning via Language Dynamics Distillation
Viaarxiv icon

Active Learning Helps Pretrained Models Learn the Intended Task

Add code
Apr 18, 2022
Figure 1 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 2 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 3 for Active Learning Helps Pretrained Models Learn the Intended Task
Figure 4 for Active Learning Helps Pretrained Models Learn the Intended Task
Viaarxiv icon

Improving Intrinsic Exploration with Language Abstractions

Add code
Feb 17, 2022
Figure 1 for Improving Intrinsic Exploration with Language Abstractions
Figure 2 for Improving Intrinsic Exploration with Language Abstractions
Figure 3 for Improving Intrinsic Exploration with Language Abstractions
Figure 4 for Improving Intrinsic Exploration with Language Abstractions
Viaarxiv icon

Calibrate your listeners! Robust communication-based training for pragmatic speakers

Add code
Oct 11, 2021
Figure 1 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 2 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 3 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Figure 4 for Calibrate your listeners! Robust communication-based training for pragmatic speakers
Viaarxiv icon

Emergent Communication of Generalizations

Add code
Jun 04, 2021
Figure 1 for Emergent Communication of Generalizations
Figure 2 for Emergent Communication of Generalizations
Figure 3 for Emergent Communication of Generalizations
Figure 4 for Emergent Communication of Generalizations
Viaarxiv icon

Compositional Explanations of Neurons

Add code
Jun 24, 2020
Figure 1 for Compositional Explanations of Neurons
Figure 2 for Compositional Explanations of Neurons
Figure 3 for Compositional Explanations of Neurons
Figure 4 for Compositional Explanations of Neurons
Viaarxiv icon

Learning to refer informatively by amortizing pragmatic reasoning

Add code
May 31, 2020
Figure 1 for Learning to refer informatively by amortizing pragmatic reasoning
Figure 2 for Learning to refer informatively by amortizing pragmatic reasoning
Figure 3 for Learning to refer informatively by amortizing pragmatic reasoning
Figure 4 for Learning to refer informatively by amortizing pragmatic reasoning
Viaarxiv icon