Picture for Ramana Kumar

Ramana Kumar

Evaluating Frontier Models for Dangerous Capabilities

Add code
Mar 20, 2024
Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

Explaining grokking through circuit efficiency

Add code
Sep 05, 2023
Viaarxiv icon

Scaling Goal-based Exploration via Pruning Proto-goals

Add code
Feb 09, 2023
Figure 1 for Scaling Goal-based Exploration via Pruning Proto-goals
Figure 2 for Scaling Goal-based Exploration via Pruning Proto-goals
Figure 3 for Scaling Goal-based Exploration via Pruning Proto-goals
Figure 4 for Scaling Goal-based Exploration via Pruning Proto-goals
Viaarxiv icon

Solving math word problems with process- and outcome-based feedback

Add code
Nov 25, 2022
Figure 1 for Solving math word problems with process- and outcome-based feedback
Figure 2 for Solving math word problems with process- and outcome-based feedback
Figure 3 for Solving math word problems with process- and outcome-based feedback
Figure 4 for Solving math word problems with process- and outcome-based feedback
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Oct 04, 2022
Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

Discovering Agents

Add code
Aug 24, 2022
Figure 1 for Discovering Agents
Figure 2 for Discovering Agents
Figure 3 for Discovering Agents
Figure 4 for Discovering Agents
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Jan 21, 2022
Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

Formal Methods for the Informal Engineer: Workshop Recommendations

Add code
Apr 01, 2021
Viaarxiv icon

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Add code
Nov 17, 2020
Figure 1 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 2 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 3 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 4 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Viaarxiv icon

REALab: An Embedded Perspective on Tampering

Add code
Nov 17, 2020
Figure 1 for REALab: An Embedded Perspective on Tampering
Figure 2 for REALab: An Embedded Perspective on Tampering
Figure 3 for REALab: An Embedded Perspective on Tampering
Figure 4 for REALab: An Embedded Perspective on Tampering
Viaarxiv icon