Alert button
Picture for Ramana Kumar

Ramana Kumar

Alert button

Evaluating Frontier Models for Dangerous Capabilities

Add code
Bookmark button
Alert button
Mar 20, 2024
Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane

Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

Explaining grokking through circuit efficiency

Add code
Bookmark button
Alert button
Sep 05, 2023
Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar

Viaarxiv icon

Scaling Goal-based Exploration via Pruning Proto-goals

Add code
Bookmark button
Alert button
Feb 09, 2023
Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul

Figure 1 for Scaling Goal-based Exploration via Pruning Proto-goals
Figure 2 for Scaling Goal-based Exploration via Pruning Proto-goals
Figure 3 for Scaling Goal-based Exploration via Pruning Proto-goals
Figure 4 for Scaling Goal-based Exploration via Pruning Proto-goals
Viaarxiv icon

Solving math word problems with process- and outcome-based feedback

Add code
Bookmark button
Alert button
Nov 25, 2022
Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins

Figure 1 for Solving math word problems with process- and outcome-based feedback
Figure 2 for Solving math word problems with process- and outcome-based feedback
Figure 3 for Solving math word problems with process- and outcome-based feedback
Figure 4 for Solving math word problems with process- and outcome-based feedback
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Bookmark button
Alert button
Oct 04, 2022
Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton

Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

Discovering Agents

Add code
Bookmark button
Alert button
Aug 24, 2022
Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

Figure 1 for Discovering Agents
Figure 2 for Discovering Agents
Figure 3 for Discovering Agents
Figure 4 for Discovering Agents
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Bookmark button
Alert button
Jan 21, 2022
Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike

Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

Formal Methods for the Informal Engineer: Workshop Recommendations

Add code
Bookmark button
Alert button
Apr 01, 2021
Gopal Sarma, James Koppel, Gregory Malecha, Patrick Schultz, Eric Drexler, Ramana Kumar, Cody Roux, Philip Zucker

Viaarxiv icon

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Add code
Bookmark button
Alert button
Nov 17, 2020
Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

Figure 1 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 2 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 3 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 4 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Viaarxiv icon

REALab: An Embedded Perspective on Tampering

Add code
Bookmark button
Alert button
Nov 17, 2020
Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

Figure 1 for REALab: An Embedded Perspective on Tampering
Figure 2 for REALab: An Embedded Perspective on Tampering
Figure 3 for REALab: An Embedded Perspective on Tampering
Figure 4 for REALab: An Embedded Perspective on Tampering
Viaarxiv icon