Picture for Erik Jones

Erik Jones

Adversaries Can Misuse Combinations of Safe Models

Add code
Jun 20, 2024
Viaarxiv icon

Feedback Loops With Language Models Drive In-Context Reward Hacking

Add code
Feb 09, 2024
Figure 1 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 2 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 3 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Figure 4 for Feedback Loops With Language Models Drive In-Context Reward Hacking
Viaarxiv icon

Orca 2: Teaching Small Language Models How to Reason

Add code
Nov 21, 2023
Figure 1 for Orca 2: Teaching Small Language Models How to Reason
Figure 2 for Orca 2: Teaching Small Language Models How to Reason
Figure 3 for Orca 2: Teaching Small Language Models How to Reason
Figure 4 for Orca 2: Teaching Small Language Models How to Reason
Viaarxiv icon

Teaching Language Models to Hallucinate Less with Synthetic Tasks

Add code
Oct 10, 2023
Figure 1 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 2 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 3 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Figure 4 for Teaching Language Models to Hallucinate Less with Synthetic Tasks
Viaarxiv icon

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Add code
Sep 26, 2023
Figure 1 for Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Figure 2 for Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Figure 3 for Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Figure 4 for Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Viaarxiv icon

Mass-Producing Failures of Multimodal Systems with Language Models

Add code
Jun 21, 2023
Figure 1 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 2 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 3 for Mass-Producing Failures of Multimodal Systems with Language Models
Figure 4 for Mass-Producing Failures of Multimodal Systems with Language Models
Viaarxiv icon

Automatically Auditing Large Language Models via Discrete Optimization

Add code
Mar 08, 2023
Figure 1 for Automatically Auditing Large Language Models via Discrete Optimization
Figure 2 for Automatically Auditing Large Language Models via Discrete Optimization
Figure 3 for Automatically Auditing Large Language Models via Discrete Optimization
Figure 4 for Automatically Auditing Large Language Models via Discrete Optimization
Viaarxiv icon

Capturing Failures of Large Language Models via Human Cognitive Biases

Add code
Feb 24, 2022
Figure 1 for Capturing Failures of Large Language Models via Human Cognitive Biases
Figure 2 for Capturing Failures of Large Language Models via Human Cognitive Biases
Figure 3 for Capturing Failures of Large Language Models via Human Cognitive Biases
Figure 4 for Capturing Failures of Large Language Models via Human Cognitive Biases
Viaarxiv icon

Selective Classification Can Magnify Disparities Across Groups

Add code
Oct 27, 2020
Figure 1 for Selective Classification Can Magnify Disparities Across Groups
Figure 2 for Selective Classification Can Magnify Disparities Across Groups
Figure 3 for Selective Classification Can Magnify Disparities Across Groups
Figure 4 for Selective Classification Can Magnify Disparities Across Groups
Viaarxiv icon

Robust Encodings: A Framework for Combating Adversarial Typos

Add code
May 04, 2020
Figure 1 for Robust Encodings: A Framework for Combating Adversarial Typos
Figure 2 for Robust Encodings: A Framework for Combating Adversarial Typos
Figure 3 for Robust Encodings: A Framework for Combating Adversarial Typos
Figure 4 for Robust Encodings: A Framework for Combating Adversarial Typos
Viaarxiv icon