Alert button
Picture for Scott Emmons

Scott Emmons

Alert button

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Add code
Bookmark button
Alert button
Mar 03, 2024
Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Viaarxiv icon

When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Add code
Bookmark button
Alert button
Feb 27, 2024
Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Viaarxiv icon

Uncovering Latent Human Wellbeing in Language Model Embeddings

Add code
Bookmark button
Alert button
Feb 19, 2024
Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons

Viaarxiv icon

A StrongREJECT for Empty Jailbreaks

Add code
Bookmark button
Alert button
Feb 15, 2024
Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

Viaarxiv icon

ALMANACS: A Simulatability Benchmark for Language Model Explainability

Add code
Bookmark button
Alert button
Dec 20, 2023
Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

Viaarxiv icon

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Add code
Bookmark button
Alert button
Sep 18, 2023
Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons

Figure 1 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Figure 2 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Figure 3 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Figure 4 for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Viaarxiv icon

Image Hijacking: Adversarial Images can Control Generative Models at Runtime

Add code
Bookmark button
Alert button
Sep 01, 2023
Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons

Figure 1 for Image Hijacking: Adversarial Images can Control Generative Models at Runtime
Figure 2 for Image Hijacking: Adversarial Images can Control Generative Models at Runtime
Figure 3 for Image Hijacking: Adversarial Images can Control Generative Models at Runtime
Figure 4 for Image Hijacking: Adversarial Images can Control Generative Models at Runtime
Viaarxiv icon

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

Add code
Bookmark button
Alert button
Apr 06, 2023
Alexander Pan, Chan Jun Shern, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

Figure 1 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Figure 2 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Figure 3 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Figure 4 for Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Viaarxiv icon

imitation: Clean Imitation Learning Implementations

Add code
Bookmark button
Alert button
Nov 22, 2022
Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

Figure 1 for imitation: Clean Imitation Learning Implementations
Figure 2 for imitation: Clean Imitation Learning Implementations
Figure 3 for imitation: Clean Imitation Learning Implementations
Figure 4 for imitation: Clean Imitation Learning Implementations
Viaarxiv icon

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

Add code
Bookmark button
Alert button
Jul 07, 2022
Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

Figure 1 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Figure 2 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Figure 3 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Figure 4 for For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria
Viaarxiv icon