Alert button
Picture for Peter Hase

Peter Hase

Alert button

Rethinking Machine Unlearning for Large Language Models

Add code
Bookmark button
Alert button
Feb 15, 2024
Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

Viaarxiv icon

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

Add code
Bookmark button
Alert button
Jan 12, 2024
Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe

Viaarxiv icon

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

Add code
Bookmark button
Alert button
Sep 29, 2023
Vaidehi Patil, Peter Hase, Mohit Bansal

Figure 1 for Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Figure 2 for Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Figure 3 for Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Figure 4 for Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Viaarxiv icon

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Add code
Bookmark button
Alert button
Jul 27, 2023
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Figure 1 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 2 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 3 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Figure 4 for Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Viaarxiv icon

Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind

Add code
Bookmark button
Alert button
Jun 15, 2023
Swarnadeep Saha, Peter Hase, Mohit Bansal

Figure 1 for Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind
Figure 2 for Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind
Figure 3 for Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind
Figure 4 for Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind
Viaarxiv icon

Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects

Add code
Bookmark button
Alert button
Jun 09, 2023
Zhuofan Ying, Peter Hase, Mohit Bansal

Figure 1 for Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Figure 2 for Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Figure 3 for Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Figure 4 for Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects
Viaarxiv icon

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Add code
Bookmark button
Alert button
Jan 10, 2023
Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun

Figure 1 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 2 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 3 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 4 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Viaarxiv icon

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations

Add code
Bookmark button
Alert button
Nov 14, 2022
Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal

Figure 1 for Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations
Figure 2 for Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations
Figure 3 for Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations
Figure 4 for Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations
Viaarxiv icon

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees

Add code
Bookmark button
Alert button
Sep 21, 2022
Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal

Figure 1 for Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Figure 2 for Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Figure 3 for Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Figure 4 for Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Viaarxiv icon

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Add code
Bookmark button
Alert button
Jun 22, 2022
Zhuofan Ying, Peter Hase, Mohit Bansal

Figure 1 for VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Figure 2 for VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Figure 3 for VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Figure 4 for VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Viaarxiv icon