Picture for Nat McAleese

Nat McAleese

Prover-Verifier Games improve legibility of LLM outputs

Add code
Jul 18, 2024
Viaarxiv icon

LLM Critics Help Catch LLM Bugs

Add code
Jun 28, 2024
Figure 1 for LLM Critics Help Catch LLM Bugs
Figure 2 for LLM Critics Help Catch LLM Bugs
Figure 3 for LLM Critics Help Catch LLM Bugs
Figure 4 for LLM Critics Help Catch LLM Bugs
Viaarxiv icon

Fine-tuning language models to find agreement among humans with diverse preferences

Add code
Nov 28, 2022
Figure 1 for Fine-tuning language models to find agreement among humans with diverse preferences
Figure 2 for Fine-tuning language models to find agreement among humans with diverse preferences
Figure 3 for Fine-tuning language models to find agreement among humans with diverse preferences
Figure 4 for Fine-tuning language models to find agreement among humans with diverse preferences
Viaarxiv icon

Fine-Tuning Language Models via Epistemic Neural Networks

Add code
Nov 03, 2022
Figure 1 for Fine-Tuning Language Models via Epistemic Neural Networks
Figure 2 for Fine-Tuning Language Models via Epistemic Neural Networks
Figure 3 for Fine-Tuning Language Models via Epistemic Neural Networks
Figure 4 for Fine-Tuning Language Models via Epistemic Neural Networks
Viaarxiv icon

Improving alignment of dialogue agents via targeted human judgements

Add code
Sep 28, 2022
Figure 1 for Improving alignment of dialogue agents via targeted human judgements
Figure 2 for Improving alignment of dialogue agents via targeted human judgements
Figure 3 for Improving alignment of dialogue agents via targeted human judgements
Figure 4 for Improving alignment of dialogue agents via targeted human judgements
Viaarxiv icon

Teaching language models to support answers with verified quotes

Add code
Mar 21, 2022
Figure 1 for Teaching language models to support answers with verified quotes
Figure 2 for Teaching language models to support answers with verified quotes
Figure 3 for Teaching language models to support answers with verified quotes
Figure 4 for Teaching language models to support answers with verified quotes
Viaarxiv icon

Red Teaming Language Models with Language Models

Add code
Feb 07, 2022
Figure 1 for Red Teaming Language Models with Language Models
Figure 2 for Red Teaming Language Models with Language Models
Figure 3 for Red Teaming Language Models with Language Models
Figure 4 for Red Teaming Language Models with Language Models
Viaarxiv icon

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Add code
Dec 08, 2021
Figure 1 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 2 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 3 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 4 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Viaarxiv icon

Open-Ended Learning Leads to Generally Capable Agents

Add code
Jul 31, 2021
Figure 1 for Open-Ended Learning Leads to Generally Capable Agents
Figure 2 for Open-Ended Learning Leads to Generally Capable Agents
Figure 3 for Open-Ended Learning Leads to Generally Capable Agents
Figure 4 for Open-Ended Learning Leads to Generally Capable Agents
Viaarxiv icon