Picture for Erik Jones

Erik Jones

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

Add code
Oct 08, 2025
Viaarxiv icon

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Add code
Mar 06, 2025
Figure 1 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Figure 2 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Figure 3 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Figure 4 for Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Viaarxiv icon

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Figure 1 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 2 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 3 for How Do Large Language Monkeys Get Their Power (Laws)?
Figure 4 for How Do Large Language Monkeys Get Their Power (Laws)?
Viaarxiv icon

Forecasting Rare Language Model Behaviors

Add code
Feb 24, 2025
Viaarxiv icon

Best-of-N Jailbreaking

Add code
Dec 04, 2024
Figure 1 for Best-of-N Jailbreaking
Figure 2 for Best-of-N Jailbreaking
Figure 3 for Best-of-N Jailbreaking
Figure 4 for Best-of-N Jailbreaking
Viaarxiv icon

Adversaries Can Misuse Combinations of Safe Models

Add code
Jun 20, 2024
Figure 1 for Adversaries Can Misuse Combinations of Safe Models
Figure 2 for Adversaries Can Misuse Combinations of Safe Models
Figure 3 for Adversaries Can Misuse Combinations of Safe Models
Figure 4 for Adversaries Can Misuse Combinations of Safe Models
Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Add code
Feb 09, 2024
Figure 1 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 2 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 3 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 4 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Viaarxiv icon

Orca 2: Teaching Small Language Models How to Reason

Add code
Nov 21, 2023
Figure 1 for Orca 2: Teaching Small Language Models How to Reason
Figure 2 for Orca 2: Teaching Small Language Models How to Reason
Figure 3 for Orca 2: Teaching Small Language Models How to Reason
Figure 4 for Orca 2: Teaching Small Language Models How to Reason
Viaarxiv icon

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Add code
Oct 10, 2023
Figure 1 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 2 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 3 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 4 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Viaarxiv icon

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Add code
Sep 26, 2023
Viaarxiv icon