Picture for Rohin Shah

Rohin Shah

Google DeepMind

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Add code
Mar 01, 2024
Figure 1 for AtP*: An efficient and scalable method for localizing LLM behaviour to components
Figure 2 for AtP*: An efficient and scalable method for localizing LLM behaviour to components
Figure 3 for AtP*: An efficient and scalable method for localizing LLM behaviour to components
Figure 4 for AtP*: An efficient and scalable method for localizing LLM behaviour to components
Viaarxiv icon

Challenges with unsupervised LLM knowledge discovery

Add code
Dec 18, 2023
Figure 1 for Challenges with unsupervised LLM knowledge discovery
Figure 2 for Challenges with unsupervised LLM knowledge discovery
Figure 3 for Challenges with unsupervised LLM knowledge discovery
Figure 4 for Challenges with unsupervised LLM knowledge discovery
Viaarxiv icon

BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

Add code
Dec 05, 2023
Figure 1 for BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
Figure 2 for BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
Figure 3 for BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
Figure 4 for BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
Viaarxiv icon

Explaining grokking through circuit efficiency

Add code
Sep 05, 2023
Figure 1 for Explaining grokking through circuit efficiency
Figure 2 for Explaining grokking through circuit efficiency
Figure 3 for Explaining grokking through circuit efficiency
Figure 4 for Explaining grokking through circuit efficiency
Viaarxiv icon

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Add code
Jul 24, 2023
Figure 1 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 2 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 3 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 4 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Viaarxiv icon

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Add code
Mar 23, 2023
Figure 1 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition
Figure 2 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition
Figure 3 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition
Figure 4 for Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition
Viaarxiv icon

SIRL: Similarity-based Implicit Representation Learning

Add code
Jan 03, 2023
Figure 1 for SIRL: Similarity-based Implicit Representation Learning
Figure 2 for SIRL: Similarity-based Implicit Representation Learning
Figure 3 for SIRL: Similarity-based Implicit Representation Learning
Figure 4 for SIRL: Similarity-based Implicit Representation Learning
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Oct 04, 2022
Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

An Empirical Investigation of Representation Learning for Imitation

Add code
May 16, 2022
Figure 1 for An Empirical Investigation of Representation Learning for Imitation
Figure 2 for An Empirical Investigation of Representation Learning for Imitation
Figure 3 for An Empirical Investigation of Representation Learning for Imitation
Figure 4 for An Empirical Investigation of Representation Learning for Imitation
Viaarxiv icon

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Add code
Apr 14, 2022
Figure 1 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 2 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 3 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 4 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Viaarxiv icon