Alert button
Picture for Coline Devin

Coline Devin

Alert button

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Jun 20, 2023
Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad Żołna, Scott Reed, Sergio Gómez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Rothörl, José Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, Nicolas Heess

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a foundation agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming multi-embodiment action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100--1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

Viaarxiv icon

How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

May 06, 2022
Alex X. Lee, Coline Devin, Jost Tobias Springenberg, Yuxiang Zhou, Thomas Lampe, Abbas Abdolmaleki, Konstantinos Bousmalis

Figure 1 for How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
Figure 2 for How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
Figure 3 for How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation
Figure 4 for How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation

Reinforcement learning (RL) has been shown to be effective at learning control from experience. However, RL typically requires a large amount of online interaction with the environment. This limits its applicability to real-world settings, such as in robotics, where such interaction is expensive. In this work we investigate ways to minimize online interactions in a target task, by reusing a suboptimal policy we might have access to, for example from training on related prior tasks, or in simulation. To this end, we develop two RL algorithms that can speed up training by using not only the action distributions of teacher policies, but also data collected by such policies on the task at hand. We conduct a thorough experimental study of how to use suboptimal teachers on a challenging robotic manipulation benchmark on vision-based stacking with diverse objects. We compare our methods to offline, online, offline-to-online, and kickstarting RL algorithms. By doing so, we find that training on data from both the teacher and student, enables the best performance for limited data budgets. We examine how to best allocate a limited data budget -- on the target task -- between the teacher and the student policy, and report experiments using varying budgets, two teachers with different degrees of suboptimality, and five stacking tasks that require a diverse set of behaviors. Our analysis, both in simulation and in the real world, shows that our approach is the best across data budgets, while standard offline RL from teacher rollouts is surprisingly effective when enough data is given.

Viaarxiv icon

Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes

Nov 03, 2021
Alex X. Lee, Coline Devin, Yuxiang Zhou, Thomas Lampe, Konstantinos Bousmalis, Jost Tobias Springenberg, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori

Figure 1 for Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes
Figure 2 for Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes
Figure 3 for Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes
Figure 4 for Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes

We study the problem of robotic stacking with objects of complex geometry. We propose a challenging and diverse set of such objects that was carefully designed to require strategies beyond a simple "pick-and-place" solution. Our method is a reinforcement learning (RL) approach combined with vision-based interactive policy distillation and simulation-to-reality transfer. Our learned policies can efficiently handle multiple object combinations in the real world and exhibit a large variety of stacking skills. In a large experimental study, we investigate what choices matter for learning such general vision-based agents in simulation, and what affects optimal transfer to the real robot. We then leverage data collected by such policies and improve upon them with offline RL. A video and a blog post of our work are provided as supplementary material.

* CoRL 2021. Video: https://dpmd.ai/robotics-stacking-YT . Blog: https://dpmd.ai/robotics-stacking . Code: https://github.com/deepmind/rgb_stacking 
Viaarxiv icon

Fully Autonomous Real-World Reinforcement Learning for Mobile Manipulation

Aug 03, 2021
Charles Sun, Jędrzej Orbik, Coline Devin, Brian Yang, Abhishek Gupta, Glen Berseth, Sergey Levine

Figure 1 for Fully Autonomous Real-World Reinforcement Learning for Mobile Manipulation
Figure 2 for Fully Autonomous Real-World Reinforcement Learning for Mobile Manipulation
Figure 3 for Fully Autonomous Real-World Reinforcement Learning for Mobile Manipulation
Figure 4 for Fully Autonomous Real-World Reinforcement Learning for Mobile Manipulation

We study how robots can autonomously learn skills that require a combination of navigation and grasping. While reinforcement learning in principle provides for automated robotic skill learning, in practice reinforcement learning in the real world is challenging and often requires extensive instrumentation and supervision. Our aim is to devise a robotic reinforcement learning system for learning navigation and manipulation together, in an autonomous way without human intervention, enabling continual learning under realistic assumptions. Our proposed system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation, without human intervention, and without access to privileged information, such as maps, objects positions, or a global view of the environment. Our method employs a modularized policy with components for manipulation and navigation, where manipulation policy uncertainty drives exploration for the navigation controller, and the manipulation module provides rewards for navigation. We evaluate our method on a room cleanup task, where the robot must navigate to and pick up items scattered on the floor. After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically, in around 40 hours of autonomous real-world training.

* 16 pages 
Viaarxiv icon

ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors

Jul 28, 2021
Charles Sun, Jędrzej Orbik, Coline Devin, Brian Yang, Abhishek Gupta, Glen Berseth, Sergey Levine

Figure 1 for ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors
Figure 2 for ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors
Figure 3 for ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors
Figure 4 for ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors

In this paper, we study how robots can autonomously learn skills that require a combination of navigation and grasping. Learning robotic skills in the real world remains challenging without large-scale data collection and supervision. Our aim is to devise a robotic reinforcement learning system for learning navigation and manipulation together, in an \textit{autonomous} way without human intervention, enabling continual learning under realistic assumptions. Specifically, our system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation, without human intervention, and without access to privileged information, such as maps, objects positions, or a global view of the environment. Our method employs a modularized policy with components for manipulation and navigation, where uncertainty over the manipulation success drives exploration for the navigation controller, and the manipulation module provides rewards for navigation. We evaluate our method on a room cleanup task, where the robot must navigate to and pick up items of scattered on the floor. After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically, in around 40 hours of real-world training.

* 17 pages 
Viaarxiv icon

Modularity Improves Out-of-Domain Instruction Following

Oct 24, 2020
Rodolfo Corona, Daniel Fried, Coline Devin, Dan Klein, Trevor Darrell

Figure 1 for Modularity Improves Out-of-Domain Instruction Following
Figure 2 for Modularity Improves Out-of-Domain Instruction Following
Figure 3 for Modularity Improves Out-of-Domain Instruction Following
Figure 4 for Modularity Improves Out-of-Domain Instruction Following

We propose a modular architecture for following natural language instructions that describe sequences of diverse subgoals, such as navigating to landmarks or picking up objects. Standard, non-modular, architectures used in instruction following do not exploit subgoal compositionality and often struggle on out-of-distribution tasks and environments. In our approach, subgoal modules each carry out natural language instructions for a specific subgoal type. A sequence of modules to execute is chosen by learning to segment the instructions and predicting a subgoal type for each segment. When compared to standard sequence-to-sequence approaches on ALFRED, a challenging instruction following benchmark, we find that modularization improves generalization to environments unseen in training and to novel tasks.

Viaarxiv icon

Self-Supervised Goal-Conditioned Pick and Place

Aug 26, 2020
Coline Devin, Payam Rowghanian, Chris Vigorito, Will Richards, Khashayar Rohanimanesh

Figure 1 for Self-Supervised Goal-Conditioned Pick and Place
Figure 2 for Self-Supervised Goal-Conditioned Pick and Place
Figure 3 for Self-Supervised Goal-Conditioned Pick and Place
Figure 4 for Self-Supervised Goal-Conditioned Pick and Place

Robots have the capability to collect large amounts of data autonomously by interacting with objects in the world. However, it is often not obvious \emph{how} to learning from autonomously collected data without human-labeled supervision. In this work we learn pixel-wise object representations from unsupervised pick and place data that generalize to new objects. We introduce a novel framework for using these representations in order to predict where to pick and where to place in order to match a goal image. Finally, we demonstrate the utility of our approach in a simulated grasping environment.

* In RSS 2020 Visual Learning and Reasoning for Robotic Manipulation Workshop 
Viaarxiv icon

Learning To Reach Goals Without Reinforcement Learning

Dec 13, 2019
Dibya Ghosh, Abhishek Gupta, Justin Fu, Ashwin Reddy, Coline Devin, Benjamin Eysenbach, Sergey Levine

Figure 1 for Learning To Reach Goals Without Reinforcement Learning
Figure 2 for Learning To Reach Goals Without Reinforcement Learning
Figure 3 for Learning To Reach Goals Without Reinforcement Learning
Figure 4 for Learning To Reach Goals Without Reinforcement Learning

Imitation learning algorithms provide a simple and straightforward approach for training control policies via supervised learning. By maximizing the likelihood of good actions provided by an expert demonstrator, supervised imitation learning can produce effective policies without the algorithmic complexities and optimization challenges of reinforcement learning, at the cost of requiring an expert demonstrator to provide the demonstrations. In this paper, we ask: can we take insights from imitation learning to design algorithms that can effectively acquire optimal policies from scratch without any expert demonstrations? The key observation that makes this possible is that, in the multi-task setting, trajectories that are generated by a suboptimal policy can still serve as optimal examples for other tasks. In particular, when tasks correspond to different goals, every trajectory is a successful demonstration for the goal state that it actually reaches. We propose a simple algorithm for learning goal-reaching behaviors without any demonstrations, complicated user-provided reward functions, or complex reinforcement learning methods. Our method simply maximizes the likelihood of actions the agent actually took in its own previous rollouts, conditioned on the goal being the state that it actually reached. Although related variants of this approach have been proposed previously in imitation learning with demonstrations, we show how this approach can effectively learn goal-reaching policies from scratch. We present a theoretical result linking self-supervised imitation learning and reinforcement learning, and empirical results showing that it performs competitively with more complex reinforcement learning methods on a range of challenging goal reaching problems, while yielding advantages in terms of stability and use of offline data.

* First two authors contributed equally 
Viaarxiv icon

SMiRL: Surprise Minimizing RL in Dynamic Environments

Dec 11, 2019
Glen Berseth, Daniel Geng, Coline Devin, Chelsea Finn, Dinesh Jayaraman, Sergey Levine

Figure 1 for SMiRL: Surprise Minimizing RL in Dynamic Environments
Figure 2 for SMiRL: Surprise Minimizing RL in Dynamic Environments
Figure 3 for SMiRL: Surprise Minimizing RL in Dynamic Environments
Figure 4 for SMiRL: Surprise Minimizing RL in Dynamic Environments

All living organisms struggle against the forces of nature to carve out niches where they can maintain homeostasis. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). SMiRL trains an agent with the objective of maximizing the probability of observed states under a model trained on previously seen states. The resulting agents can acquire proactive behaviors that seek out and maintain stable conditions, such as balancing and damage avoidance, that are closely tied to an environment's prevailing sources of entropy, such as wind, earthquakes, and other agents. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls and navigate to escape enemy agents, without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task reward to accelerate reward-driven learning.

* PrePrint 
Viaarxiv icon