Picture for Jan Leike

Jan Leike

Prover-Verifier Games improve legibility of LLM outputs

Add code
Jul 18, 2024
Viaarxiv icon

LLM Critics Help Catch LLM Bugs

Add code
Jun 28, 2024
Figure 1 for LLM Critics Help Catch LLM Bugs
Figure 2 for LLM Critics Help Catch LLM Bugs
Figure 3 for LLM Critics Help Catch LLM Bugs
Figure 4 for LLM Critics Help Catch LLM Bugs
Viaarxiv icon

Scaling and evaluating sparse autoencoders

Add code
Jun 06, 2024
Viaarxiv icon

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Add code
Dec 14, 2023
Viaarxiv icon

Let's Verify Step by Step

Add code
May 31, 2023
Figure 1 for Let's Verify Step by Step
Figure 2 for Let's Verify Step by Step
Figure 3 for Let's Verify Step by Step
Figure 4 for Let's Verify Step by Step
Viaarxiv icon

Self-critiquing models for assisting human evaluators

Add code
Jun 14, 2022
Figure 1 for Self-critiquing models for assisting human evaluators
Figure 2 for Self-critiquing models for assisting human evaluators
Figure 3 for Self-critiquing models for assisting human evaluators
Figure 4 for Self-critiquing models for assisting human evaluators
Viaarxiv icon

Training language models to follow instructions with human feedback

Add code
Mar 04, 2022
Figure 1 for Training language models to follow instructions with human feedback
Figure 2 for Training language models to follow instructions with human feedback
Figure 3 for Training language models to follow instructions with human feedback
Figure 4 for Training language models to follow instructions with human feedback
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Jan 21, 2022
Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

Recursively Summarizing Books with Human Feedback

Add code
Sep 27, 2021
Figure 1 for Recursively Summarizing Books with Human Feedback
Figure 2 for Recursively Summarizing Books with Human Feedback
Figure 3 for Recursively Summarizing Books with Human Feedback
Figure 4 for Recursively Summarizing Books with Human Feedback
Viaarxiv icon

Evaluating Large Language Models Trained on Code

Add code
Jul 14, 2021
Figure 1 for Evaluating Large Language Models Trained on Code
Figure 2 for Evaluating Large Language Models Trained on Code
Figure 3 for Evaluating Large Language Models Trained on Code
Figure 4 for Evaluating Large Language Models Trained on Code
Viaarxiv icon