Picture for Scott Emmons

Scott Emmons

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Add code
Jun 02, 2024
Figure 1 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 2 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 3 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 4 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Viaarxiv icon

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Add code
Mar 03, 2024
Figure 1 for When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning
Figure 2 for When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning
Figure 3 for When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning
Figure 4 for When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning
Viaarxiv icon

Uncovering Latent Human Wellbeing in Language Model Embeddings

Add code
Feb 19, 2024
Viaarxiv icon

A StrongREJECT for Empty Jailbreaks

Add code
Feb 15, 2024
Figure 1 for A StrongREJECT for Empty Jailbreaks
Figure 2 for A StrongREJECT for Empty Jailbreaks
Figure 3 for A StrongREJECT for Empty Jailbreaks
Figure 4 for A StrongREJECT for Empty Jailbreaks
Viaarxiv icon

ALMANACS: A Simulatability Benchmark for Language Model Explainability

Add code
Dec 20, 2023
Viaarxiv icon

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Add code
Sep 18, 2023
Figure 1 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Figure 2 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Figure 3 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Figure 4 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Viaarxiv icon

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Add code
Apr 06, 2023
Figure 1 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Figure 2 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Figure 3 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Figure 4 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Viaarxiv icon

imitation: Clean Imitation Learning Implementations

Add code
Nov 22, 2022
Figure 1 for imitation: Clean Imitation Learning Implementations
Figure 2 for imitation: Clean Imitation Learning Implementations
Figure 3 for imitation: Clean Imitation Learning Implementations
Figure 4 for imitation: Clean Imitation Learning Implementations
Viaarxiv icon

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Add code
Jul 07, 2022
Figure 1 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Figure 2 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Figure 3 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Figure 4 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Viaarxiv icon

An Empirical Investigation of Representation Learning for Imitation

Add code
May 16, 2022
Figure 1 for An Empirical Investigation of Representation Learning for Imitation
Figure 2 for An Empirical Investigation of Representation Learning for Imitation
Figure 3 for An Empirical Investigation of Representation Learning for Imitation
Figure 4 for An Empirical Investigation of Representation Learning for Imitation
Viaarxiv icon