Picture for Kellin Pelrine

Kellin Pelrine

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

Add code
Jul 15, 2025
Viaarxiv icon

Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability

Add code
May 22, 2025
Viaarxiv icon

The Structural Safety Generalization Problem

Add code
Apr 13, 2025
Viaarxiv icon

From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions

Add code
Apr 01, 2025
Viaarxiv icon

Epistemic Integrity in Large Language Models

Add code
Nov 10, 2024
Viaarxiv icon

A Guide to Misinformation Detection Datasets

Add code
Nov 07, 2024
Figure 1 for A Guide to Misinformation Detection Datasets
Figure 2 for A Guide to Misinformation Detection Datasets
Figure 3 for A Guide to Misinformation Detection Datasets
Figure 4 for A Guide to Misinformation Detection Datasets
Viaarxiv icon

A Simulation System Towards Solving Societal-Scale Manipulation

Add code
Oct 17, 2024
Figure 1 for A Simulation System Towards Solving Societal-Scale Manipulation
Figure 2 for A Simulation System Towards Solving Societal-Scale Manipulation
Figure 3 for A Simulation System Towards Solving Societal-Scale Manipulation
Viaarxiv icon

Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks

Add code
Aug 29, 2024
Viaarxiv icon

Scaling Laws for Data Poisoning in LLMs

Add code
Aug 06, 2024
Figure 1 for Scaling Laws for Data Poisoning in LLMs
Figure 2 for Scaling Laws for Data Poisoning in LLMs
Figure 3 for Scaling Laws for Data Poisoning in LLMs
Figure 4 for Scaling Laws for Data Poisoning in LLMs
Viaarxiv icon

Can Go AIs be adversarially robust?

Add code
Jun 18, 2024
Viaarxiv icon