Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vladimir Mikulik

Alignment of Language Agents

Mar 26, 2021

Zachary Kenton, Tom Everitt, Laura Weidinger, Iason Gabriel, Vladimir Mikulik, Geoffrey Irving

Abstract:For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including deceptive or manipulative language, and review some approaches for avoiding these issues.

Via

Access Paper or Ask Questions

Causal Analysis of Agent Behavior for AI Safety

Mar 05, 2021

Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, Pedro A. Ortega

Figure 1 for Causal Analysis of Agent Behavior for AI Safety

Figure 2 for Causal Analysis of Agent Behavior for AI Safety

Figure 3 for Causal Analysis of Agent Behavior for AI Safety

Figure 4 for Causal Analysis of Agent Behavior for AI Safety

Abstract:As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a methodology for investigating the causal mechanisms that drive the behaviour of artificial agents. Six use cases are covered, each addressing a typical question an analyst might ask about an agent. In particular, we show that each question cannot be addressed by pure observation alone, but instead requires conducting experiments with systematically chosen manipulations so as to generate the correct causal evidence.

* 16 pages, 16 figures, 6 tables

Via

Access Paper or Ask Questions

Algorithms for Causal Reasoning in Probability Trees

Nov 12, 2020

Tim Genewein, Tom McGrath, Grégoire Déletang, Vladimir Mikulik, Miljan Martic, Shane Legg, Pedro A. Ortega

Figure 1 for Algorithms for Causal Reasoning in Probability Trees

Figure 2 for Algorithms for Causal Reasoning in Probability Trees

Figure 3 for Algorithms for Causal Reasoning in Probability Trees

Figure 4 for Algorithms for Causal Reasoning in Probability Trees

Abstract:Probability trees are one of the simplest models of causal generative processes. They possess clean semantics and -- unlike causal Bayesian networks -- they can represent context-specific causal dependencies, which are necessary for e.g. causal induction. Yet, they have received little attention from the AI and ML community. Here we present concrete algorithms for causal reasoning in discrete probability trees that cover the entire causal hierarchy (association, intervention, and counterfactuals), and operate on arbitrary propositional and causal events. Our work expands the domain of causal reasoning to a very general class of discrete stochastic processes.

* (2nd version with correction to algorithm) 11 pages, 8 figures, 5 algorithms. A companion Colaboratory tutorial is available at https://github.com/deepmind/deepmind-research/tree/master/causal_reasoning

Via

Access Paper or Ask Questions

Meta-trained agents implement Bayes-optimal agents

Oct 21, 2020

Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega

Figure 1 for Meta-trained agents implement Bayes-optimal agents

Figure 2 for Meta-trained agents implement Bayes-optimal agents

Figure 3 for Meta-trained agents implement Bayes-optimal agents

Figure 4 for Meta-trained agents implement Bayes-optimal agents

Abstract:Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

* Published at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

Via

Access Paper or Ask Questions

Neural networks are a priori biased towards Boolean functions with low entropy

Sep 29, 2019

Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis

Figure 1 for Neural networks are a priori biased towards Boolean functions with low entropy

Figure 2 for Neural networks are a priori biased towards Boolean functions with low entropy

Figure 3 for Neural networks are a priori biased towards Boolean functions with low entropy

Figure 4 for Neural networks are a priori biased towards Boolean functions with low entropy

Abstract:Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with $n$ input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies $t$ points in $\{0,1\}^n$ as $1$ has a remarkably simple form: $ P(t) = 2^{-n} \,\, {\rm for} \,\, 0\leq t < 2^n$. Since a perceptron can express far fewer Boolean functions with small or large values of $t$ (low "entropy") than with intermediate values of $t$ (high "entropy") there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed $t$, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.

* Under review as a conference paper at ICLR 2020

Via

Access Paper or Ask Questions

Risks from Learned Optimization in Advanced Machine Learning Systems

Jun 11, 2019

Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

Figure 1 for Risks from Learned Optimization in Advanced Machine Learning Systems

Figure 2 for Risks from Learned Optimization in Advanced Machine Learning Systems

Figure 3 for Risks from Learned Optimization in Advanced Machine Learning Systems

Abstract:We analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer - a situation we refer to as mesa-optimization, a neologism we introduce in this paper. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be - how will it differ from the loss function it was trained under - and how can it be aligned? In this paper, we provide an in-depth analysis of these two primary questions and provide an overview of topics for future research.

Via

Access Paper or Ask Questions