Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shane Legg

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Jun 28, 2018

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning(+2 more)

Figure 1 for IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Figure 2 for IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Figure 3 for IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Figure 4 for IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Abstract:In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.

Via

Access Paper or Ask Questions

Measuring and avoiding side effects using relative reachability

Jun 04, 2018

Victoria Krakovna, Laurent Orseau, Miljan Martic, Shane Legg

Figure 1 for Measuring and avoiding side effects using relative reachability

Figure 2 for Measuring and avoiding side effects using relative reachability

Figure 3 for Measuring and avoiding side effects using relative reachability

Figure 4 for Measuring and avoiding side effects using relative reachability

Abstract:How can we design reinforcement learning agents that avoid causing unnecessary disruptions to their environment? We argue that current approaches to penalizing side effects can introduce bad incentives in tasks that require irreversible actions, and in environments that contain sources of change other than the agent. For example, some approaches give the agent an incentive to prevent any irreversible changes in the environment, including the actions of other agents. We introduce a general definition of side effects, based on relative reachability of states compared to a default state, that avoids these undesirable incentives. Using a set of gridworld experiments illustrating relevant scenarios, we empirically compare relative reachability to penalties based on existing definitions and show that it is the only penalty among those tested that produces the desired behavior in all the scenarios.

Via

Access Paper or Ask Questions

Agents and Devices: A Relative Definition of Agency

May 31, 2018

Laurent Orseau, Simon McGregor McGill, Shane Legg

Figure 1 for Agents and Devices: A Relative Definition of Agency

Figure 2 for Agents and Devices: A Relative Definition of Agency

Figure 3 for Agents and Devices: A Relative Definition of Agency

Figure 4 for Agents and Devices: A Relative Definition of Agency

Abstract:According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance. Humans tend to find the physical stance more helpful for certain systems, such as planets orbiting a star, and the intentional stance for others, such as living animals. We define a formal counterpart of physical and intentional stances within computational theory: a description of a system as either a device, or an agent, with the key difference being that `devices' are directly described in terms of an input-output mapping, while `agents' are described in terms of the function they optimise. Bayes' rule can then be applied to calculate the subjective probability of a system being a device or an agent, based only on its behaviour. We illustrate this using the trajectories of an object in a toy grid-world domain.

Via

Access Paper or Ask Questions

Noisy Networks for Exploration

Feb 15, 2018

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin(+2 more)

Figure 1 for Noisy Networks for Exploration

Figure 2 for Noisy Networks for Exploration

Figure 3 for Noisy Networks for Exploration

Figure 4 for Noisy Networks for Exploration

Abstract:We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

* ICLR 2018

Via

Access Paper or Ask Questions

Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

Feb 04, 2018

Joel Z. Leibo, Cyprien de Masson d'Autume, Daniel Zoran, David Amos, Charles Beattie, Keith Anderson, Antonio García Castañeda, Manuel Sanchez, Simon Green, Audrunas Gruslys(+3 more)

Figure 1 for Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

Figure 2 for Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

Figure 3 for Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

Figure 4 for Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents

Abstract:Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artificial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implementations of several classical experimental paradigms including visual search, change detection, random dot motion discrimination, and multiple object tracking. We also contribute a study of the visual psychophysics of a specific state-of-the-art deep reinforcement learning agent: UNREAL (Jaderberg et al. 2016). This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli. In turn, this insight motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boost UNREAL's performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. By open-sourcing Psychlab we hope to facilitate a range of future such studies that simultaneously advance deep reinforcement learning and improve its links with cognitive science.

* 28 pages, 11 figures

Via

Access Paper or Ask Questions

AI Safety Gridworlds

Nov 28, 2017

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

Abstract:We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.

Via

Access Paper or Ask Questions

Reinforcement Learning with a Corrupted Reward Channel

Aug 19, 2017

Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

Figure 1 for Reinforcement Learning with a Corrupted Reward Channel

Figure 2 for Reinforcement Learning with a Corrupted Reward Channel

Figure 3 for Reinforcement Learning with a Corrupted Reward Channel

Figure 4 for Reinforcement Learning with a Corrupted Reward Channel

Abstract:No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

* A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track

Via

Access Paper or Ask Questions

Deep reinforcement learning from human preferences

Jul 13, 2017

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei

Figure 1 for Deep reinforcement learning from human preferences

Figure 2 for Deep reinforcement learning from human preferences

Figure 3 for Deep reinforcement learning from human preferences

Figure 4 for Deep reinforcement learning from human preferences

Abstract:For sophisticated reinforcement learning (RL) systems to interact usefully with real-world environments, we need to communicate complex goals to these systems. In this work, we explore goals defined in terms of (non-expert) human preferences between pairs of trajectory segments. We show that this approach can effectively solve complex RL tasks without access to the reward function, including Atari games and simulated robot locomotion, while providing feedback on less than one percent of our agent's interactions with the environment. This reduces the cost of human oversight far enough that it can be practically applied to state-of-the-art RL systems. To demonstrate the flexibility of our approach, we show that we can successfully train complex novel behaviors with about an hour of human time. These behaviors and environments are considerably more complex than any that have been previously learned from human feedback.

Via

Access Paper or Ask Questions

DeepMind Lab

Dec 13, 2016

Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik(+11 more)

Abstract:DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community.

* 11 pages, 8 figures

Via

Access Paper or Ask Questions

Massively Parallel Methods for Deep Reinforcement Learning

Jul 16, 2015

Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen(+4 more)

Figure 1 for Massively Parallel Methods for Deep Reinforcement Learning

Figure 2 for Massively Parallel Methods for Deep Reinforcement Learning

Figure 3 for Massively Parallel Methods for Deep Reinforcement Learning

Figure 4 for Massively Parallel Methods for Deep Reinforcement Learning

Abstract:We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.

* Presented at the Deep Learning Workshop, International Conference on Machine Learning, Lille, France, 2015

Via

Access Paper or Ask Questions