Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nando de Freitas

University of British Columbia

Learning to Perform Physics Experiments via Deep Reinforcement Learning

Aug 17, 2017

Misha Denil, Pulkit Agrawal, Tejas D Kulkarni, Tom Erez, Peter Battaglia, Nando de Freitas

Figure 1 for Learning to Perform Physics Experiments via Deep Reinforcement Learning

Figure 2 for Learning to Perform Physics Experiments via Deep Reinforcement Learning

Figure 3 for Learning to Perform Physics Experiments via Deep Reinforcement Learning

Abstract:When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman performance in Go, Atari, natural language processing, and complex control problems; however, it is not clear that these systems can rival the scientific intuition of even a young child. In this work we introduce a basic set of tasks that require agents to estimate properties such as mass and cohesion of objects in an interactive simulated environment where they can manipulate the objects and observe the consequences. We found that state of art deep reinforcement learning methods can learn to perform the experiments necessary to discover such hidden properties. By systematically manipulating the problem difficulty and the cost incurred by the agent for performing experiments, we found that agents learn different strategies that balance the cost of gathering information against the cost of making mistakes in different situations.

Via

Access Paper or Ask Questions

Robust Imitation of Diverse Behaviors

Jul 14, 2017

Ziyu Wang, Josh Merel, Scott Reed, Greg Wayne, Nando de Freitas, Nicolas Heess

Figure 1 for Robust Imitation of Diverse Behaviors

Figure 2 for Robust Imitation of Diverse Behaviors

Figure 3 for Robust Imitation of Diverse Behaviors

Figure 4 for Robust Imitation of Diverse Behaviors

Abstract:Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data, even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped and a 62 DoF 3D humanoid in the MuJoCo physics environment.

Via

Access Paper or Ask Questions

The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Jul 11, 2017

Serkan Cabi, Sergio Gómez Colmenarejo, Matthew W. Hoffman, Misha Denil, Ziyu Wang, Nando de Freitas

Figure 1 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Figure 2 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Figure 3 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Figure 4 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Abstract:This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.

Via

Access Paper or Ask Questions

Sample Efficient Actor-Critic with Experience Replay

Jul 10, 2017

Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

Figure 1 for Sample Efficient Actor-Critic with Experience Replay

Figure 2 for Sample Efficient Actor-Critic with Experience Replay

Figure 3 for Sample Efficient Actor-Critic with Experience Replay

Figure 4 for Sample Efficient Actor-Critic with Experience Replay

Abstract:This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.

* 20 pages. Prepared for ICLR 2017

Via

Access Paper or Ask Questions

Programmable Agents

Jun 20, 2017

Misha Denil, Sergio Gómez Colmenarejo, Serkan Cabi, David Saxton, Nando de Freitas

Abstract:We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.

Via

Access Paper or Ask Questions

Learning to Learn without Gradient Descent by Gradient Descent

Jun 12, 2017

Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Matt Botvinick, Nando de Freitas

Figure 1 for Learning to Learn without Gradient Descent by Gradient Descent

Figure 2 for Learning to Learn without Gradient Descent by Gradient Descent

Figure 3 for Learning to Learn without Gradient Descent by Gradient Descent

Figure 4 for Learning to Learn without Gradient Descent by Gradient Descent

Abstract:We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

* Accepted by ICML 2017. Previous version "Learning to Learn for Global Optimization of Black Box Functions" was published in the Deep Reinforcement Learning Workshop, NIPS 2016

Via

Access Paper or Ask Questions

Parallel Multiscale Autoregressive Density Estimation

Mar 10, 2017

Scott Reed, Aäron van den Oord, Nal Kalchbrenner, Sergio Gómez Colmenarejo, Ziyu Wang, Dan Belov, Nando de Freitas

Figure 1 for Parallel Multiscale Autoregressive Density Estimation

Figure 2 for Parallel Multiscale Autoregressive Density Estimation

Figure 3 for Parallel Multiscale Autoregressive Density Estimation

Figure 4 for Parallel Multiscale Autoregressive Density Estimation

Abstract:PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512x512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.

Via

Access Paper or Ask Questions

LipNet: End-to-End Sentence-level Lipreading

Dec 16, 2016

Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas

Figure 1 for LipNet: End-to-End Sentence-level Lipreading

Figure 2 for LipNet: End-to-End Sentence-level Lipreading

Figure 3 for LipNet: End-to-End Sentence-level Lipreading

Figure 4 for LipNet: End-to-End Sentence-level Lipreading

Abstract:Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rather than sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).

Via

Access Paper or Ask Questions

Learning to learn by gradient descent by gradient descent

Nov 30, 2016

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas

Figure 1 for Learning to learn by gradient descent by gradient descent

Figure 2 for Learning to learn by gradient descent by gradient descent

Figure 3 for Learning to learn by gradient descent by gradient descent

Figure 4 for Learning to learn by gradient descent by gradient descent

Abstract:The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.

Via

Access Paper or Ask Questions

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

May 24, 2016

Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

Figure 1 for Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Figure 2 for Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Figure 3 for Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Figure 4 for Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Abstract:We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

Via

Access Paper or Ask Questions