Picture for Brendan Murphy

Brendan Murphy

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

Add code
Jul 15, 2025
Viaarxiv icon

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Figure 1 for Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Figure 2 for Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Figure 3 for Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Figure 4 for Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
Viaarxiv icon

Scaling Laws for Data Poisoning in LLMs

Add code
Aug 06, 2024
Figure 1 for Scaling Laws for Data Poisoning in LLMs
Figure 2 for Scaling Laws for Data Poisoning in LLMs
Figure 3 for Scaling Laws for Data Poisoning in LLMs
Figure 4 for Scaling Laws for Data Poisoning in LLMs
Viaarxiv icon