Alert button
Picture for Shashwat Goel

Shashwat Goel

Alert button

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Bookmark button
Alert button
Mar 06, 2024
Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Liu, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks

Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

Corrective Machine Unlearning

Add code
Bookmark button
Alert button
Feb 21, 2024
Shashwat Goel, Ameya Prabhu, Philip Torr, Ponnurangam Kumaraguru, Amartya Sanyal

Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Add code
Bookmark button
Alert button
Oct 10, 2023
Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

Proportional Aggregation of Preferences for Sequential Decision Making

Add code
Bookmark button
Alert button
Jun 26, 2023
Nikhil Chandak, Shashwat Goel, Dominik Peters

Figure 1 for Proportional Aggregation of Preferences for Sequential Decision Making
Figure 2 for Proportional Aggregation of Preferences for Sequential Decision Making
Figure 3 for Proportional Aggregation of Preferences for Sequential Decision Making
Figure 4 for Proportional Aggregation of Preferences for Sequential Decision Making
Viaarxiv icon

Low impact agency: review and discussion

Add code
Bookmark button
Alert button
Mar 06, 2023
Danilo Naiff, Shashwat Goel

Figure 1 for Low impact agency: review and discussion
Figure 2 for Low impact agency: review and discussion
Figure 3 for Low impact agency: review and discussion
Figure 4 for Low impact agency: review and discussion
Viaarxiv icon

Evaluating Inexact Unlearning Requires Revisiting Forgetting

Add code
Bookmark button
Alert button
Jan 17, 2022
Shashwat Goel, Ameya Prabhu, Ponnurangam Kumaraguru

Figure 1 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Figure 2 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Figure 3 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Figure 4 for Evaluating Inexact Unlearning Requires Revisiting Forgetting
Viaarxiv icon

From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation

Add code
Bookmark button
Alert button
Aug 27, 2021
Shashwat Goel, Kunwar Shaanjeet Singh Grover

Figure 1 for From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation
Figure 2 for From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation
Figure 3 for From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation
Figure 4 for From Pivots to Graphs: Augmented CycleDensity as a Generalization to One Time InverseConsultation
Viaarxiv icon