Picture for Jamie Hayes

Jamie Hayes

Dj

Cascading Adversarial Bias from Injection to Distillation in Language Models

Add code
May 30, 2025
Viaarxiv icon

Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

Add code
May 24, 2025
Viaarxiv icon

Lessons from Defending Gemini Against Indirect Prompt Injections

Add code
May 20, 2025
Viaarxiv icon

Defeating Prompt Injections by Design

Add code
Mar 24, 2025
Viaarxiv icon

$(\varepsilon, δ)$ Considered Harmful: Best Practices for Reporting Differential Privacy Guarantees

Add code
Mar 13, 2025
Viaarxiv icon

Interpreting the Repeated Token Phenomenon in Large Language Models

Add code
Mar 11, 2025
Viaarxiv icon

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

Add code
Dec 09, 2024
Figure 1 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 2 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 3 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Figure 4 for Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
Viaarxiv icon

To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

Add code
Nov 15, 2024
Viaarxiv icon

Stealing User Prompts from Mixture of Experts

Add code
Oct 30, 2024
Viaarxiv icon

Measuring memorization through probabilistic discoverable extraction

Add code
Oct 25, 2024
Figure 1 for Measuring memorization through probabilistic discoverable extraction
Figure 2 for Measuring memorization through probabilistic discoverable extraction
Figure 3 for Measuring memorization through probabilistic discoverable extraction
Figure 4 for Measuring memorization through probabilistic discoverable extraction
Viaarxiv icon