Picture for Zachary Kenton

Zachary Kenton

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Viaarxiv icon

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Add code
Apr 23, 2024
Figure 1 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 2 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 3 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 4 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Viaarxiv icon

Challenges with unsupervised LLM knowledge discovery

Add code
Dec 18, 2023
Figure 1 for Challenges with unsupervised LLM knowledge discovery
Figure 2 for Challenges with unsupervised LLM knowledge discovery
Figure 3 for Challenges with unsupervised LLM knowledge discovery
Figure 4 for Challenges with unsupervised LLM knowledge discovery
Viaarxiv icon

Explaining grokking through circuit efficiency

Add code
Sep 05, 2023
Figure 1 for Explaining grokking through circuit efficiency
Figure 2 for Explaining grokking through circuit efficiency
Figure 3 for Explaining grokking through circuit efficiency
Figure 4 for Explaining grokking through circuit efficiency
Viaarxiv icon

Discovering Agents

Add code
Aug 24, 2022
Figure 1 for Discovering Agents
Figure 2 for Discovering Agents
Figure 3 for Discovering Agents
Figure 4 for Discovering Agents
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Jan 21, 2022
Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

Alignment of Language Agents

Add code
Mar 26, 2021
Viaarxiv icon

Imitating Interactive Intelligence

Add code
Jan 21, 2021
Figure 1 for Imitating Interactive Intelligence
Figure 2 for Imitating Interactive Intelligence
Figure 3 for Imitating Interactive Intelligence
Figure 4 for Imitating Interactive Intelligence
Viaarxiv icon

A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Add code
Dec 22, 2019
Figure 1 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks
Figure 2 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks
Figure 3 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks
Figure 4 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks
Viaarxiv icon

Generalizing from a few environments in safety-critical reinforcement learning

Add code
Jul 02, 2019
Figure 1 for Generalizing from a few environments in safety-critical reinforcement learning
Figure 2 for Generalizing from a few environments in safety-critical reinforcement learning
Figure 3 for Generalizing from a few environments in safety-critical reinforcement learning
Figure 4 for Generalizing from a few environments in safety-critical reinforcement learning
Viaarxiv icon