Picture for Peter Henderson

Peter Henderson

JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits

Add code
Jun 06, 2024
Figure 1 for JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
Figure 2 for JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
Figure 3 for JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
Figure 4 for JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Figure 1 for AI Risk Management Should Incorporate Both Safety and Security
Viaarxiv icon

FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

Add code
Apr 02, 2024
Viaarxiv icon

What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety

Add code
Apr 01, 2024
Figure 1 for What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
Figure 2 for What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
Figure 3 for What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
Figure 4 for What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

On the Societal Impact of Open Foundation Models

Add code
Feb 27, 2024
Figure 1 for On the Societal Impact of Open Foundation Models
Figure 2 for On the Societal Impact of Open Foundation Models
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Figure 1 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 2 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 3 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Figure 4 for Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Viaarxiv icon

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Add code
Oct 05, 2023
Figure 1 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 2 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 3 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 4 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Viaarxiv icon

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Add code
Aug 20, 2023
Figure 1 for LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Figure 2 for LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Figure 3 for LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Figure 4 for LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Viaarxiv icon

Where's the Liability in Harmful AI Speech?

Add code
Aug 16, 2023
Figure 1 for Where's the Liability in Harmful AI Speech?
Figure 2 for Where's the Liability in Harmful AI Speech?
Figure 3 for Where's the Liability in Harmful AI Speech?
Figure 4 for Where's the Liability in Harmful AI Speech?
Viaarxiv icon