Picture for Victoria Krakovna

Victoria Krakovna

Google DeepMind

Evaluating Frontier Models for Dangerous Capabilities

Add code
Mar 20, 2024
Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

Limitations of Agents Simulated by Predictive Models

Add code
Feb 08, 2024
Viaarxiv icon

Quantifying stability of non-power-seeking in artificial agents

Add code
Jan 07, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Power-seeking can be probable and predictive for trained agents

Add code
Apr 13, 2023
Figure 1 for Power-seeking can be probable and predictive for trained agents
Figure 2 for Power-seeking can be probable and predictive for trained agents
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Oct 04, 2022
Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Add code
Nov 17, 2020
Figure 1 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 2 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 3 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 4 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Viaarxiv icon

REALab: An Embedded Perspective on Tampering

Add code
Nov 17, 2020
Figure 1 for REALab: An Embedded Perspective on Tampering
Figure 2 for REALab: An Embedded Perspective on Tampering
Figure 3 for REALab: An Embedded Perspective on Tampering
Figure 4 for REALab: An Embedded Perspective on Tampering
Viaarxiv icon

Avoiding Side Effects By Considering Future Tasks

Add code
Oct 15, 2020
Figure 1 for Avoiding Side Effects By Considering Future Tasks
Figure 2 for Avoiding Side Effects By Considering Future Tasks
Figure 3 for Avoiding Side Effects By Considering Future Tasks
Figure 4 for Avoiding Side Effects By Considering Future Tasks
Viaarxiv icon

Modeling AGI Safety Frameworks with Causal Influence Diagrams

Add code
Jun 20, 2019
Figure 1 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Figure 2 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Figure 3 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Figure 4 for Modeling AGI Safety Frameworks with Causal Influence Diagrams
Viaarxiv icon