Picture for Olivia Watkins

Olivia Watkins

Tony

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Add code
Oct 05, 2025
Viaarxiv icon

Persona Features Control Emergent Misalignment

Add code
Jun 24, 2025
Figure 1 for Persona Features Control Emergent Misalignment
Figure 2 for Persona Features Control Emergent Misalignment
Figure 3 for Persona Features Control Emergent Misalignment
Figure 4 for Persona Features Control Emergent Misalignment
Viaarxiv icon

OpenAI o1 System Card

Add code
Dec 21, 2024
Figure 1 for OpenAI o1 System Card
Figure 2 for OpenAI o1 System Card
Figure 3 for OpenAI o1 System Card
Figure 4 for OpenAI o1 System Card
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

A StrongREJECT for Empty Jailbreaks

Add code
Feb 15, 2024
Viaarxiv icon

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Add code
Nov 02, 2023
Viaarxiv icon

Learning to Model the World with Language

Add code
Jul 31, 2023
Figure 1 for Learning to Model the World with Language
Figure 2 for Learning to Model the World with Language
Figure 3 for Learning to Model the World with Language
Figure 4 for Learning to Model the World with Language
Viaarxiv icon

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Add code
May 25, 2023
Figure 1 for DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Figure 2 for DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Figure 3 for DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Figure 4 for DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Viaarxiv icon

Aligning Text-to-Image Models using Human Feedback

Add code
Feb 23, 2023
Figure 1 for Aligning Text-to-Image Models using Human Feedback
Figure 2 for Aligning Text-to-Image Models using Human Feedback
Figure 3 for Aligning Text-to-Image Models using Human Feedback
Figure 4 for Aligning Text-to-Image Models using Human Feedback
Viaarxiv icon

Guiding Pretraining in Reinforcement Learning with Large Language Models

Add code
Feb 13, 2023
Figure 1 for Guiding Pretraining in Reinforcement Learning with Large Language Models
Figure 2 for Guiding Pretraining in Reinforcement Learning with Large Language Models
Figure 3 for Guiding Pretraining in Reinforcement Learning with Large Language Models
Figure 4 for Guiding Pretraining in Reinforcement Learning with Large Language Models
Viaarxiv icon