Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Benchmarking Interpretability Tools for Deep Neural Networks


Feb 08, 2023
Stephen Casper, Yuxiao Li, Jiawei Li, Tong Bu, Kevin Zhang, Dylan Hadfield-Menell

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks


Nov 22, 2022
Stephen Casper, Kaivalya Hariharan, Dylan Hadfield-Menell

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

White-Box Adversarial Policies in Deep Reinforcement Learning


Sep 05, 2022
Stephen Casper, Dylan Hadfield-Menell, Gabriel Kreiman

Add code

* Code is available at https://github.com/thestephencasper/white_box_rarl 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks


Jul 28, 2022
Tilman Räuker, Anson Ho, Stephen Casper, Dylan Hadfield-Menell

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Detecting Modularity in Deep Neural Networks


Oct 13, 2021
Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell

Add code

* Code is available at https://github.com/thestephencasper/detecting_nn_modularity 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features


Oct 11, 2021
Stephen Casper, Max Nadeau, Gabriel Kreiman

Add code

* Code is available at: https://github.com/thestephencasper/feature_fool 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Clusterability in Neural Networks


Mar 04, 2021
Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell

Add code

* 20 pages, 22 figures. arXiv admin note: text overlap with arXiv:2003.04881 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

The Achilles Heel Hypothesis: Pitfalls for AI Systems via Decision Theoretic Adversaries


Oct 12, 2020
Stephen Casper

Add code

* Contact info for author at stephencasper.com 

   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email

Probing Neural Dialog Models for Conversational Understanding


Jun 07, 2020
Abdelrhman Saleh, Tovly Deutsch, Stephen Casper, Yonatan Belinkov, Stuart Shieber

Add code


   Access Paper or Ask Questions

  • Share via Twitter
  • Share via Facebook
  • Share via LinkedIn
  • Share via Whatsapp
  • Share via Messenger
  • Share via Email
1
2
>>