Picture for Eric Wallace

Eric Wallace

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Add code
Jun 28, 2024
Viaarxiv icon

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Add code
Apr 19, 2024
Viaarxiv icon

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Add code
Mar 08, 2024
Figure 1 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 2 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 3 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 4 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Viaarxiv icon

What Evidence Do Language Models Find Convincing?

Add code
Feb 19, 2024
Viaarxiv icon

Scalable Extraction of Training Data from (Production) Language Models

Add code
Nov 28, 2023
Figure 1 for Scalable Extraction of Training Data from (Production) Language Models
Figure 2 for Scalable Extraction of Training Data from (Production) Language Models
Figure 3 for Scalable Extraction of Training Data from (Production) Language Models
Figure 4 for Scalable Extraction of Training Data from (Production) Language Models
Viaarxiv icon

Privacy Side Channels in Machine Learning Systems

Add code
Sep 11, 2023
Viaarxiv icon

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

Add code
Aug 08, 2023
Figure 1 for SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Figure 2 for SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Figure 3 for SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Figure 4 for SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Viaarxiv icon

The False Promise of Imitating Proprietary LLMs

Add code
May 25, 2023
Figure 1 for The False Promise of Imitating Proprietary LLMs
Figure 2 for The False Promise of Imitating Proprietary LLMs
Figure 3 for The False Promise of Imitating Proprietary LLMs
Figure 4 for The False Promise of Imitating Proprietary LLMs
Viaarxiv icon

Poisoning Language Models During Instruction Tuning

Add code
May 01, 2023
Figure 1 for Poisoning Language Models During Instruction Tuning
Figure 2 for Poisoning Language Models During Instruction Tuning
Figure 3 for Poisoning Language Models During Instruction Tuning
Figure 4 for Poisoning Language Models During Instruction Tuning
Viaarxiv icon

Extracting Training Data from Diffusion Models

Add code
Jan 30, 2023
Figure 1 for Extracting Training Data from Diffusion Models
Figure 2 for Extracting Training Data from Diffusion Models
Figure 3 for Extracting Training Data from Diffusion Models
Figure 4 for Extracting Training Data from Diffusion Models
Viaarxiv icon