Picture for Peter Henderson

Peter Henderson

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Add code
Jun 26, 2024
Figure 1 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 2 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Viaarxiv icon

Evaluating Copyright Takedown Methods for Language Models

Add code
Jun 26, 2024
Figure 1 for Evaluating Copyright Takedown Methods for Language Models
Figure 2 for Evaluating Copyright Takedown Methods for Language Models
Figure 3 for Evaluating Copyright Takedown Methods for Language Models
Figure 4 for Evaluating Copyright Takedown Methods for Language Models
Viaarxiv icon

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Add code
Jun 20, 2024
Figure 1 for Fantastic Copyrighted Beasts and How (Not) to Generate Them
Figure 2 for Fantastic Copyrighted Beasts and How (Not) to Generate Them
Figure 3 for Fantastic Copyrighted Beasts and How (Not) to Generate Them
Figure 4 for Fantastic Copyrighted Beasts and How (Not) to Generate Them
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Figure 1 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 2 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 3 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Figure 4 for SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Viaarxiv icon

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Figure 1 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 2 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 3 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Figure 4 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Viaarxiv icon

JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits

Add code
Jun 06, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

Add code
Apr 02, 2024
Figure 1 for FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
Figure 2 for FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
Figure 3 for FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
Figure 4 for FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
Viaarxiv icon

What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety

Add code
Apr 01, 2024
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon