Picture for Jonah Brown-Cohen

Jonah Brown-Cohen

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Figure 1 for On scalable oversight with weak LLMs judging strong LLMs
Figure 2 for On scalable oversight with weak LLMs judging strong LLMs
Figure 3 for On scalable oversight with weak LLMs judging strong LLMs
Figure 4 for On scalable oversight with weak LLMs judging strong LLMs
Viaarxiv icon

Scalable AI Safety via Doubly-Efficient Debate

Add code
Nov 23, 2023
Figure 1 for Scalable AI Safety via Doubly-Efficient Debate
Figure 2 for Scalable AI Safety via Doubly-Efficient Debate
Figure 3 for Scalable AI Safety via Doubly-Efficient Debate
Figure 4 for Scalable AI Safety via Doubly-Efficient Debate
Viaarxiv icon

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

Add code
Oct 26, 2023
Figure 1 for Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Figure 2 for Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Figure 3 for Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Figure 4 for Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models
Viaarxiv icon

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

Add code
Jun 09, 2023
Figure 1 for Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Figure 2 for Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Figure 3 for Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Figure 4 for Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Viaarxiv icon

Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error

Add code
Dec 27, 2021
Figure 1 for Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error
Figure 2 for Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error
Figure 3 for Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error
Viaarxiv icon