Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Le Paine

RL Unplugged: Benchmarks for Offline Reinforcement Learning

Jul 02, 2020

Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru(+8 more)

Figure 1 for RL Unplugged: Benchmarks for Offline Reinforcement Learning

Figure 2 for RL Unplugged: Benchmarks for Offline Reinforcement Learning

Figure 3 for RL Unplugged: Benchmarks for Offline Reinforcement Learning

Figure 4 for RL Unplugged: Benchmarks for Offline Reinforcement Learning

Abstract:Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games ({\em e.g.,} Atari benchmark) and simulated motor control problems ({\em e.g.,} DM Control Suite). The datasets include domains that are partially or fully observable, use continuous or discrete actions, and have stochastic vs. deterministic dynamics. We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. We will release data for all our tasks and open-source all algorithms presented in this paper. We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Moving forward, we view RL Unplugged as a living benchmark suite that will evolve and grow with datasets contributed by the research community and ourselves. Our project page is available on github (https://git.io/JJUhd).

* 21 pages including supplementary material, the github link for the datasets: https://github.com/deepmind/deepmind-research/rl_unplugged

Via

Access Paper or Ask Questions

Acme: A Research Framework for Distributed Reinforcement Learning

Jun 01, 2020

Matt Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli(+10 more)

Figure 1 for Acme: A Research Framework for Distributed Reinforcement Learning

Figure 2 for Acme: A Research Framework for Distributed Reinforcement Learning

Figure 3 for Acme: A Research Framework for Distributed Reinforcement Learning

Figure 4 for Acme: A Research Framework for Distributed Reinforcement Learning

Abstract:Deep reinforcement learning has led to many recent-and groundbreaking-advancements. However, these advances have often come at the cost of both the scale and complexity of the underlying RL algorithms. Increases in complexity have in turn made it more difficult for researchers to reproduce published RL algorithms or rapidly prototype ideas. To address this, we introduce Acme, a tool to simplify the development of novel RL algorithms that is specifically designed to enable simple agent implementations that can be run at various scales of execution. Our aim is also to make the results of various RL algorithms developed in academia and industrial labs easier to reproduce and extend. To this end we are releasing baseline implementations of various algorithms, created using our framework. In this work we introduce the major design decisions behind Acme and show how these are used to construct these baselines. We also experiment with these agents at different scales of both complexity and computation-including distributed versions. Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

Via

Access Paper or Ask Questions

Improving the Gating Mechanism of Recurrent Neural Networks

Oct 22, 2019

Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu

Figure 1 for Improving the Gating Mechanism of Recurrent Neural Networks

Figure 2 for Improving the Gating Mechanism of Recurrent Neural Networks

Figure 3 for Improving the Gating Mechanism of Recurrent Neural Networks

Figure 4 for Improving the Gating Mechanism of Recurrent Neural Networks

Abstract:Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono-initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved.

Via

Access Paper or Ask Questions

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Sep 03, 2019

Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams(+4 more)

Figure 1 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Figure 2 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Figure 3 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Figure 4 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Abstract:This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration.

Via

Access Paper or Ask Questions

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Oct 11, 2018

Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden(+1 more)

Figure 1 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Figure 2 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Figure 3 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Figure 4 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Abstract:Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

Via

Access Paper or Ask Questions

Playing hard exploration games by watching YouTube

May 29, 2018

Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

Figure 1 for Playing hard exploration games by watching YouTube

Figure 2 for Playing hard exploration games by watching YouTube

Figure 3 for Playing hard exploration games by watching YouTube

Figure 4 for Playing hard exploration games by watching YouTube

Abstract:Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent's exact environment setup and the demonstrator's action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games Montezuma's Revenge, Pitfall! and Private Eye for the first time, even if the agent is not presented with any environment rewards.

Via

Access Paper or Ask Questions

Fast Generation for Convolutional Autoregressive Models

Apr 20, 2017

Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, Mohammad Babaeizadeh, Shiyu Chang, Yang Zhang, Mark A. Hasegawa-Johnson, Roy H. Campbell, Thomas S. Huang

Figure 1 for Fast Generation for Convolutional Autoregressive Models

Figure 2 for Fast Generation for Convolutional Autoregressive Models

Figure 3 for Fast Generation for Convolutional Autoregressive Models

Figure 4 for Fast Generation for Convolutional Autoregressive Models

Abstract:Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a na\"{i}ve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environments. In this work, we describe a method to speed up generation in convolutional autoregressive models. The key idea is to cache hidden states to avoid redundant computation. We apply our fast generation method to the Wavenet and PixelCNN++ models and achieve up to $21\times$ and $183\times$ speedups respectively.

* Accepted at ICLR 2017 Workshop

Via

Access Paper or Ask Questions

Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Mar 16, 2017

Pooya Khorrami, Tom Le Paine, Thomas S. Huang

Figure 1 for Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Figure 2 for Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Figure 3 for Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Figure 4 for Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Abstract:Despite being the appearance-based classifier of choice in recent years, relatively few works have examined how much convolutional neural networks (CNNs) can improve performance on accepted expression recognition benchmarks and, more importantly, examine what it is they actually learn. In this work, not only do we show that CNNs can achieve strong performance, but we also introduce an approach to decipher which portions of the face influence the CNN's predictions. First, we train a zero-bias CNN on facial expression data and achieve, to our knowledge, state-of-the-art performance on two expression recognition benchmarks: the extended Cohn-Kanade (CK+) dataset and the Toronto Face Dataset (TFD). We then qualitatively analyze the network by visualizing the spatial patterns that maximally excite different neurons in the convolutional layers and show how they resemble Facial Action Units (FAUs). Finally, we use the FAU labels provided in the CK+ dataset to verify that the FAUs observed in our filter visualizations indeed align with the subject's facial movements.

* Accepted at ICCV 2015 CV4AC Workshop. Corrected numbers in Tables 2 and 3

Via

Access Paper or Ask Questions

How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Jan 10, 2017

Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, Thomas S. Huang

Figure 1 for How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Figure 2 for How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Figure 3 for How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Figure 4 for How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Abstract:We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.

* Accepted at ICIP 2016. Fixed typo in Experiments section

Via

Access Paper or Ask Questions

Fast Wavenet Generation Algorithm

Nov 29, 2016

Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit Ramachandran, Mark A. Hasegawa-Johnson, Thomas S. Huang

Figure 1 for Fast Wavenet Generation Algorithm

Figure 2 for Fast Wavenet Generation Algorithm

Figure 3 for Fast Wavenet Generation Algorithm

Figure 4 for Fast Wavenet Generation Algorithm

Abstract:This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advantages of our fast implementation over a naive one. While this method is presented for Wavenet, the same scheme can be applied anytime one wants to perform autoregressive generation or online prediction using a model with dilated convolution layers. The code for our method is publicly available.

* Technical Report

Via

Access Paper or Ask Questions