Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Avoiding Side Effects in Complex Environments

Jun 11, 2020
Alexander Matt Turner, Neale Ratzlaff, Prasad Tadepalli

Share this with someone who'll enjoy it:

Reward function specification can be difficult, even in simple environments. Realistic environments contain millions of states. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead, completes the specified task, and avoids side effects.

* 16 pages with appendices 

   Access Paper Source

Share this with someone who'll enjoy it: