Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Serkan Cabi

A Framework for Data-Driven Robotics

Sep 26, 2019

Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Żołna, Yusuf Aytar, David Budden, Mel Vecerik(+6 more)

Figure 1 for A Framework for Data-Driven Robotics

Figure 2 for A Framework for Data-Driven Robotics

Figure 3 for A Framework for Data-Driven Robotics

Figure 4 for A Framework for Data-Driven Robotics

Abstract:We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.

Via

Access Paper or Ask Questions

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Oct 11, 2018

Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden(+1 more)

Figure 1 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Figure 2 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Figure 3 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Figure 4 for One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Abstract:Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

Via

Access Paper or Ask Questions

Reinforcement and Imitation Learning for Diverse Visuomotor Skills

May 27, 2018

Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, Tom Erez, Serkan Cabi, Saran Tunyasuvunakool, János Kramár, Raia Hadsell, Nando de Freitas(+1 more)

Figure 1 for Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Figure 2 for Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Figure 3 for Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Figure 4 for Reinforcement and Imitation Learning for Diverse Visuomotor Skills

Abstract:We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. In experiments, our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0

* 13 pages, 6 figures, Published in RSS 2018

Via

Access Paper or Ask Questions

Learning Awareness Models

Apr 17, 2018

Brandon Amos, Laurent Dinh, Serkan Cabi, Thomas Rothörl, Sergio Gómez Colmenarejo, Alistair Muldal, Tom Erez, Yuval Tassa, Nando de Freitas, Misha Denil

Abstract:We consider the setting of an agent with a fixed body interacting with an unknown and uncertain external world. We show that models trained to predict proprioceptive information about the agent's body come to represent objects in the external world. In spite of being trained with only internally available signals, these dynamic body models come to represent external objects through the necessity of predicting their effects on the agent's own body. That is, the model learns holistic persistent representations of objects in the world, even though the only training signals are body signals. Our dynamics model is able to successfully predict distributions over 132 sensor readings over 100 steps into the future and we demonstrate that even when the body is no longer in contact with an object, the latent variables of the dynamics model continue to represent its shape. We show that active data collection by maximizing the entropy of predictions about the body---touch sensors, proprioception and vestibular information---leads to learning of dynamic models that show superior performance when used for control. We also collect data from a real robotic hand and show that the same models can be used to answer questions about properties of objects in the real world. Videos with qualitative results of our models are available at https://goo.gl/mZuqAV.

* Accepted to ICLR 2018

Via

Access Paper or Ask Questions

The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Jul 11, 2017

Serkan Cabi, Sergio Gómez Colmenarejo, Matthew W. Hoffman, Misha Denil, Ziyu Wang, Nando de Freitas

Figure 1 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Figure 2 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Figure 3 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Figure 4 for The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Abstract:This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.

Via

Access Paper or Ask Questions

Programmable Agents

Jun 20, 2017

Misha Denil, Sergio Gómez Colmenarejo, Serkan Cabi, David Saxton, Nando de Freitas

Abstract:We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.

Via

Access Paper or Ask Questions