Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hado van Hasselt

The Barbados 2018 List of Open Issues in Continual Learning

Nov 16, 2018

Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

Abstract:We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-long workshop held in Barbados in February 2018.

* NIPS Continual Learning Workshop 2018

Via

Access Paper or Ask Questions

Multi-task Deep Reinforcement Learning with PopArt

Sep 12, 2018

Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt

Figure 1 for Multi-task Deep Reinforcement Learning with PopArt

Figure 2 for Multi-task Deep Reinforcement Learning with PopArt

Figure 3 for Multi-task Deep Reinforcement Learning with PopArt

Figure 4 for Multi-task Deep Reinforcement Learning with PopArt

Abstract:The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequential-decision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent's updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

Via

Access Paper or Ask Questions

Unicorn: Continual Learning with a Universal, Off-policy Agent

Jul 03, 2018

Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul

Figure 1 for Unicorn: Continual Learning with a Universal, Off-policy Agent

Figure 2 for Unicorn: Continual Learning with a Universal, Off-policy Agent

Figure 3 for Unicorn: Continual Learning with a Universal, Off-policy Agent

Figure 4 for Unicorn: Continual Learning with a Universal, Off-policy Agent

Abstract:Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.

Via

Access Paper or Ask Questions

Observe and Look Further: Achieving Consistent Performance on Atari

May 29, 2018

Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík(+3 more)

Figure 1 for Observe and Look Further: Achieving Consistent Performance on Atari

Figure 2 for Observe and Look Further: Achieving Consistent Performance on Atari

Figure 3 for Observe and Look Further: Achieving Consistent Performance on Atari

Figure 4 for Observe and Look Further: Achieving Consistent Performance on Atari

Abstract:Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games. We identify three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and exploring efficiently. In this paper, we propose an algorithm that addresses each of these challenges and is able to learn human-level policies on nearly all Atari games. A new transformed Bellman operator allows our algorithm to process rewards of varying densities and scales; an auxiliary temporal consistency loss allows us to train stably using a discount factor of $\gamma = 0.999$ (instead of $\gamma = 0.99$) extending the effective planning horizon by an order of magnitude; and we ease the exploration problem by using human demonstrations that guide the agent towards rewarding states. When tested on a set of 42 Atari games, our algorithm exceeds the performance of an average human on 40 games using a common set of hyper parameters. Furthermore, it is the first deep RL algorithm to solve the first level of Montezuma's Revenge.

Via

Access Paper or Ask Questions

Meta-Gradient Reinforcement Learning

May 24, 2018

Zhongwen Xu, Hado van Hasselt, David Silver

Figure 1 for Meta-Gradient Reinforcement Learning

Figure 2 for Meta-Gradient Reinforcement Learning

Figure 3 for Meta-Gradient Reinforcement Learning

Figure 4 for Meta-Gradient Reinforcement Learning

Abstract:The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves. It is well-known that these decisions are crucial to the overall success of RL algorithms. We discuss a gradient-based meta-learning algorithm that is able to adapt the nature of the return, online, whilst interacting and learning from the environment. When applied to 57 games on the Atari 2600 environment over 200 million frames, our algorithm achieved a new state-of-the-art performance.

Via

Access Paper or Ask Questions

Successor Features for Transfer in Reinforcement Learning

Apr 12, 2018

André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

Figure 1 for Successor Features for Transfer in Reinforcement Learning

Figure 2 for Successor Features for Transfer in Reinforcement Learning

Figure 3 for Successor Features for Transfer in Reinforcement Learning

Figure 4 for Successor Features for Transfer in Reinforcement Learning

Abstract:Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics of the environment from the rewards, and "generalized policy improvement", a generalization of dynamic programming's policy improvement operation that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows the free exchange of information across tasks. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice, significantly outperforming alternative methods in a sequence of navigation tasks and in the control of a simulated robotic arm.

* Published at NIPS 2017

Via

Access Paper or Ask Questions

Distributed Prioritized Experience Replay

Mar 02, 2018

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver

Figure 1 for Distributed Prioritized Experience Replay

Figure 2 for Distributed Prioritized Experience Replay

Figure 3 for Distributed Prioritized Experience Replay

Figure 4 for Distributed Prioritized Experience Replay

Abstract:We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time.

* Accepted to International Conference on Learning Representations 2018

Via

Access Paper or Ask Questions

Rainbow: Combining Improvements in Deep Reinforcement Learning

Oct 06, 2017

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

Figure 1 for Rainbow: Combining Improvements in Deep Reinforcement Learning

Figure 2 for Rainbow: Combining Improvements in Deep Reinforcement Learning

Figure 3 for Rainbow: Combining Improvements in Deep Reinforcement Learning

Figure 4 for Rainbow: Combining Improvements in Deep Reinforcement Learning

Abstract:The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.

* Under review as a conference paper at AAAI 2018

Via

Access Paper or Ask Questions

StarCraft II: A New Challenge for Reinforcement Learning

Aug 16, 2017

Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser(+15 more)

Figure 1 for StarCraft II: A New Challenge for Reinforcement Learning

Figure 2 for StarCraft II: A New Challenge for Reinforcement Learning

Figure 3 for StarCraft II: A New Challenge for Reinforcement Learning

Figure 4 for StarCraft II: A New Challenge for Reinforcement Learning

Abstract:This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

* Collaboration between DeepMind & Blizzard. 20 pages, 9 figures, 2 tables

Via

Access Paper or Ask Questions

The Predictron: End-To-End Learning and Planning

Jul 20, 2017

David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto(+1 more)

Figure 1 for The Predictron: End-To-End Learning and Planning

Figure 2 for The Predictron: End-To-End Learning and Planning

Figure 3 for The Predictron: End-To-End Learning and Planning

Figure 4 for The Predictron: End-To-End Learning and Planning

Abstract:One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple "imagined" planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures.

* Camera-ready version, ICML 2017, with supplement

Via

Access Paper or Ask Questions