Picture for Eli Tran-Johnson

Eli Tran-Johnson

Specific versus General Principles for Constitutional AI

Add code
Oct 20, 2023
Figure 1 for Specific versus General Principles for Constitutional AI
Figure 2 for Specific versus General Principles for Constitutional AI
Figure 3 for Specific versus General Principles for Constitutional AI
Figure 4 for Specific versus General Principles for Constitutional AI
Viaarxiv icon

The Capacity for Moral Self-Correction in Large Language Models

Add code
Feb 18, 2023
Figure 1 for The Capacity for Moral Self-Correction in Large Language Models
Figure 2 for The Capacity for Moral Self-Correction in Large Language Models
Figure 3 for The Capacity for Moral Self-Correction in Large Language Models
Figure 4 for The Capacity for Moral Self-Correction in Large Language Models
Viaarxiv icon

Discovering Language Model Behaviors with Model-Written Evaluations

Add code
Dec 19, 2022
Figure 1 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 2 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 3 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 4 for Discovering Language Model Behaviors with Model-Written Evaluations
Viaarxiv icon

Constitutional AI: Harmlessness from AI Feedback

Add code
Dec 15, 2022
Figure 1 for Constitutional AI: Harmlessness from AI Feedback
Figure 2 for Constitutional AI: Harmlessness from AI Feedback
Figure 3 for Constitutional AI: Harmlessness from AI Feedback
Figure 4 for Constitutional AI: Harmlessness from AI Feedback
Viaarxiv icon

Measuring Progress on Scalable Oversight for Large Language Models

Add code
Nov 11, 2022
Figure 1 for Measuring Progress on Scalable Oversight for Large Language Models
Figure 2 for Measuring Progress on Scalable Oversight for Large Language Models
Figure 3 for Measuring Progress on Scalable Oversight for Large Language Models
Viaarxiv icon

Language Models (Mostly) Know What They Know

Add code
Jul 16, 2022
Figure 1 for Language Models (Mostly) Know What They Know
Figure 2 for Language Models (Mostly) Know What They Know
Figure 3 for Language Models (Mostly) Know What They Know
Figure 4 for Language Models (Mostly) Know What They Know
Viaarxiv icon