Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pieter Abbeel

UC Berkeley

Language-Conditioned Path Planning

Aug 31, 2023

Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

Abstract:Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.

* Conference on Robot Learning, 2023

Via

Access Paper or Ask Questions

Language Reward Modulation for Pretraining Reinforcement Learning

Aug 23, 2023

Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel

Abstract:Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.

* Code available at https://github.com/ademiadeniji/lamp

Via

Access Paper or Ask Questions

The Impact of Overall Optimization on Warehouse Automation

Aug 11, 2023

Hiroshi Yoshitake, Pieter Abbeel

Abstract:In this study, we propose a novel approach for investigating optimization performance by flexible robot coordination in automated warehouses with multi-agent reinforcement learning (MARL)-based control. Automated systems using robots are expected to achieve efficient operations compared with manual systems in terms of overall optimization performance. However, the impact of overall optimization on performance remains unclear in most automated systems due to a lack of suitable control methods. Thus, we proposed a centralized training-and-decentralized execution MARL framework as a practical overall optimization control method. In the proposed framework, we also proposed a single shared critic, trained with global states and rewards, applicable to a case in which heterogeneous agents make decisions asynchronously. Our proposed MARL framework was applied to the task selection of material handling equipment through automated order picking simulation, and its performance was evaluated to determine how far overall optimization outperforms partial optimization by comparing it with other MARL frameworks and rule-based control methods.

* 8 pages, 4 figures, accepted at International Conference on Intelligent Robots and Systems (IROS2023)

Via

Access Paper or Ask Questions

Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects

Jul 31, 2023

Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb

Abstract:Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications. Prior work in this space has largely focused on planning algorithms in simulation, but real-world packing performance is often bottlenecked by the difficulty of perceiving 3D object geometry in highly occluded, partially observed scenes. In this work, we present a fully-convolutional shape completion model, F-CON, which can be easily combined with off-the-shelf planning methods for dense packing in the real world. We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications, and use it to demonstrate that F-CON outperforms other state-of-the-art shape completion methods. Finally, we equip a real-world pick-and-place system with F-CON, and demonstrate dense packing of complex, unseen objects in cluttered scenes. Across multiple planning methods, F-CON enables substantially better dense packing than other shape completion methods.

* In IROS 2023. Code and dataset are available at https://sites.google.com/view/fcon-packing/

Via

Access Paper or Ask Questions

Learning to Model the World with Language

Jul 31, 2023

Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan

Figure 1 for Learning to Model the World with Language

Figure 2 for Learning to Model the World with Language

Figure 3 for Learning to Model the World with Language

Figure 4 for Learning to Model the World with Language

Abstract:To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We present Dynalang, an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts. Unlike traditional agents that use language only to predict actions, Dynalang acquires rich language understanding by using past language also to predict future language, video, and rewards. In addition to learning from online interaction in an environment, Dynalang can be pretrained on datasets of text, video, or both without actions or rewards. From using language hints in grid worlds to navigating photorealistic scans of homes, Dynalang utilizes diverse types of language to improve task performance, including environment descriptions, game rules, and instructions.

* Website: https://dynalang.github.io/

Via

Access Paper or Ask Questions

SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Jul 07, 2023

Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel

Figure 1 for SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Figure 2 for SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Figure 3 for SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Figure 4 for SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

Abstract:The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that have broad generalization. Prior works have explored visual pre-training with different self-supervised objectives, but the generalization capabilities of the learned policies remain relatively unknown. In this work, we take the first step towards this challenge, focusing on how pre-trained representations can help the generalization of the learned policies. We first identify the key bottleneck in using a frozen pre-trained visual backbone for policy learning. We then propose SpawnNet, a novel two-stream architecture that learns to fuse pre-trained multi-layer representations into a separate network to learn a robust policy. Through extensive simulated and real experiments, we demonstrate significantly better categorical generalization compared to prior approaches in imitation learning settings.

Via

Access Paper or Ask Questions

Improving Long-Horizon Imitation Through Instruction Prediction

Jun 21, 2023

Joey Hejna, Pieter Abbeel, Lerrel Pinto

Abstract:Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.

* Published at AAAI 2023

Via

Access Paper or Ask Questions

ALP: Action-Aware Embodied Learning for Perception

Jun 16, 2023

Xinran Liang, Anthony Han, Wilson Yan, Aditi Raghunathan, Pieter Abbeel

Figure 1 for ALP: Action-Aware Embodied Learning for Perception

Figure 2 for ALP: Action-Aware Embodied Learning for Perception

Figure 3 for ALP: Action-Aware Embodied Learning for Perception

Figure 4 for ALP: Action-Aware Embodied Learning for Perception

Abstract:Current methods in training and benchmarking vision models exhibit an over-reliance on passive, curated datasets. Although models trained on these datasets have shown strong performance in a wide variety of tasks such as classification, detection, and segmentation, they fundamentally are unable to generalize to an ever-evolving world due to constant out-of-distribution shifts of input data. Therefore, instead of training on fixed datasets, can we approach learning in a more human-centric and adaptive manner? In this paper, we introduce \textbf{A}ction-aware Embodied \textbf{L}earning for \textbf{P}erception (ALP), an embodied learning framework that incorporates action information into representation learning through a combination of optimizing policy gradients through reinforcement learning and inverse dynamics prediction objectives. Our method actively explores complex 3D environments to both learn generalizable task-agnostic representations as well as collect downstream training data. We show that ALP outperforms existing baselines in object detection and semantic segmentation. In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.

* preprint

Via

Access Paper or Ask Questions

Probabilistic Adaptation of Text-to-Video Models

Jun 02, 2023

Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Figure 1 for Probabilistic Adaptation of Text-to-Video Models

Figure 2 for Probabilistic Adaptation of Text-to-Video Models

Figure 3 for Probabilistic Adaptation of Text-to-Video Models

Figure 4 for Probabilistic Adaptation of Text-to-Video Models

Abstract:Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions. However, adapting these models to tasks with limited domain-specific data, such as animation or robotics videos, poses a significant computational challenge, since finetuning a pretrained large model can be prohibitively expensive. Inspired by how a small modifiable component (e.g., prompts, prefix-tuning) can adapt a large language model to perform new tasks without requiring access to the model weights, we investigate how to adapt a large pretrained text-to-video model to a variety of downstream domains and tasks without finetuning. In answering this question, we propose Video Adapter, which leverages the score function of a large pretrained video diffusion model as a probabilistic prior to guide the generation of a task-specific small video model. Our experiments show that Video Adapter is capable of incorporating the broad knowledge and preserving the high fidelity of a large pretrained video model in a task-specific small video model that is able to generate high-quality yet specialized videos on a variety of tasks such as animation, egocentric modeling, and modeling of simulated and real-world robotics data. More videos can be found on the website https://video-adapter.github.io/.

* Project website https://video-adapter.github.io/. First two authors contributed equally

Via

Access Paper or Ask Questions

Train Offline, Test Online: A Real Robot Learning Benchmark

Jun 01, 2023

Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto(+2 more)

Figure 1 for Train Offline, Test Online: A Real Robot Learning Benchmark

Figure 2 for Train Offline, Test Online: A Real Robot Learning Benchmark

Figure 3 for Train Offline, Test Online: A Real Robot Learning Benchmark

Figure 4 for Train Offline, Test Online: A Real Robot Learning Benchmark

Abstract:Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods on common tasks and an open-source dataset of these tasks for offline training. Its manipulation task suite requires challenging generalization to unseen objects, positions, and lighting. We present initial results on TOTO comparing five pretrained visual representations and four offline policy learning baselines, remotely contributed by five institutions. The real promise of TOTO, however, lies in the future: we release the benchmark for additional submissions from any user, enabling easy, direct comparison to several methods without the need to obtain hardware or collect data.

* Accepted to ICRA 2023

Via

Access Paper or Ask Questions