Picture for Saurav Kadavath

Saurav Kadavath

Specific versus General Principles for Constitutional AI

Oct 20, 2023
Figure 1 for Specific versus General Principles for Constitutional AI
Figure 2 for Specific versus General Principles for Constitutional AI
Figure 3 for Specific versus General Principles for Constitutional AI
Figure 4 for Specific versus General Principles for Constitutional AI
Viaarxiv icon

Measuring Faithfulness in Chain-of-Thought Reasoning

Jul 17, 2023
Figure 1 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 2 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 3 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 4 for Measuring Faithfulness in Chain-of-Thought Reasoning
Viaarxiv icon

The Capacity for Moral Self-Correction in Large Language Models

Feb 18, 2023
Figure 1 for The Capacity for Moral Self-Correction in Large Language Models
Figure 2 for The Capacity for Moral Self-Correction in Large Language Models
Figure 3 for The Capacity for Moral Self-Correction in Large Language Models
Figure 4 for The Capacity for Moral Self-Correction in Large Language Models
Viaarxiv icon

Discovering Language Model Behaviors with Model-Written Evaluations

Add code
Dec 19, 2022
Figure 1 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 2 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 3 for Discovering Language Model Behaviors with Model-Written Evaluations
Figure 4 for Discovering Language Model Behaviors with Model-Written Evaluations
Viaarxiv icon

Constitutional AI: Harmlessness from AI Feedback

Add code
Dec 15, 2022
Figure 1 for Constitutional AI: Harmlessness from AI Feedback
Figure 2 for Constitutional AI: Harmlessness from AI Feedback
Figure 3 for Constitutional AI: Harmlessness from AI Feedback
Figure 4 for Constitutional AI: Harmlessness from AI Feedback
Viaarxiv icon

DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments

Add code
Sep 24, 2022
Figure 1 for DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments
Figure 2 for DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments
Figure 3 for DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments
Figure 4 for DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments
Viaarxiv icon

Language Models (Mostly) Know What They Know

Jul 16, 2022
Figure 1 for Language Models (Mostly) Know What They Know
Figure 2 for Language Models (Mostly) Know What They Know
Figure 3 for Language Models (Mostly) Know What They Know
Figure 4 for Language Models (Mostly) Know What They Know
Viaarxiv icon

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Add code
Apr 12, 2022
Figure 1 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Figure 2 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Figure 3 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Figure 4 for Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Viaarxiv icon

Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree

Add code
Oct 06, 2021
Figure 1 for Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree
Figure 2 for Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree
Figure 3 for Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree
Figure 4 for Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree
Viaarxiv icon

Measuring Coding Challenge Competence With APPS

Add code
May 27, 2021
Figure 1 for Measuring Coding Challenge Competence With APPS
Figure 2 for Measuring Coding Challenge Competence With APPS
Figure 3 for Measuring Coding Challenge Competence With APPS
Figure 4 for Measuring Coding Challenge Competence With APPS
Viaarxiv icon