Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks


Nov 22, 2022
Stephen Casper, Kaivalya Hariharan, Dylan Hadfield-Menell

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

White-Box Adversarial Policies in Deep Reinforcement Learning


Sep 05, 2022
Stephen Casper, Dylan Hadfield-Menell, Gabriel Kreiman

Add code

* Code is available at https://github.com/thestephencasper/white_box_rarl 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL


Aug 22, 2022
Phillip J. K. Christoffersen, Andreas A. Haupt, Dylan Hadfield-Menell

Add code

* 12 pages, 7 figures 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Towards Psychologically-Grounded Dynamic Preference Models


Aug 06, 2022
Mihaela Curmei, Andreas Haupt, Dylan Hadfield-Menell, Benjamin Recht

Add code

* In Sixteenth ACM Conference on Recommender Systems, September 18-23, 2022, Seattle, WA, USA, 14 pages 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks


Jul 28, 2022
Tilman Räuker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis


Jul 20, 2022
Jonathan Stray, Alon Halevy, Parisa Assar, Dylan Hadfield-Menell, Craig Boutilier, Amar Ashar, Lex Beattie, Michael Ekstrand, Claire Leibowicz, Connie Moon Sehat, Sara Johansen, Lianne Kerlin, David Vickrey, Spandana Singh, Sanne Vrijenhoek, Amy Zhang, McKane Andrus, Natali Helberger, Polina Proutskova, Tanushree Mitra, Nina Vasan

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

How to talk so your robot will learn: Instructions, descriptions, and pragmatics


Jun 16, 2022
Theodore R Sumers, Robert D Hawkins, Mark K Ho, Thomas L Griffiths, Dylan Hadfield-Menell

Add code

* 9 pages, 4 figures 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Estimating and Penalizing Induced Preference Shifts in Recommender Systems


Apr 25, 2022
Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, Anca Dragan

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Linguistic communication as (inverse) reward design


Apr 11, 2022
Theodore R. Sumers, Robert D. Hawkins, Mark K. Ho, Thomas L. Griffiths, Dylan Hadfield-Menell

Add code

* 6 pages, 3 figures. Accepted at Learning from Natural Language Supervision workshop (ACL 2022) 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email
1
2
3
4
>>