Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ken Goldberg

AUTOLab at the University of California, Berkeley

Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Jun 24, 2023

Shrey Aeron, Edith LLontop, Aviv Adler, Wisdom C. Agboh, Mehmet R Dogar, Ken Goldberg

Figure 1 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Figure 2 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Figure 3 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Figure 4 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Abstract:Recently, robots have seen rapidly increasing use in homes and warehouses to declutter by collecting objects from a planar surface and placing them into a container. While current techniques grasp objects individually, Multi-Object Grasping (MOG) can improve efficiency by increasing the average number of objects grasped per trip (OpT). However, grasping multiple objects requires the objects to be aligned and in close proximity. In this work, we propose Push-MOG, an algorithm that computes "fork pushing" actions using a parallel-jaw gripper to create graspable object clusters. In physical decluttering experiments, we find that Push-MOG enables multi-object grasps, increasing the average OpT by 34%. Code and videos will be available at https://sites.google.com/berkeley.edu/push-mog.

* 6 pages, 4 figures, CASE 2023

Via

Access Paper or Ask Questions

Robot Learning with Sensorimotor Pre-training

Jun 16, 2023

Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, Jitendra Malik

Figure 1 for Robot Learning with Sensorimotor Pre-training

Figure 2 for Robot Learning with Sensorimotor Pre-training

Figure 3 for Robot Learning with Sensorimotor Pre-training

Figure 4 for Robot Learning with Sensorimotor Pre-training

Abstract:We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. Given a sequence of camera images, proprioceptive robot states, and past actions, we encode the interleaved sequence into tokens, mask out a random subset, and train a model to predict the masked-out content. We hypothesize that if the robot can predict the missing content it has acquired a good model of the physical world that can enable it to act. RPT is designed to operate on latent visual representations which makes prediction tractable, enables scaling to 10x larger models, and 10 Hz inference on a real robot. To evaluate our approach, we collect a dataset of 20,000 real-world trajectories over 9 months using a combination of motion planning and model-based grasping algorithms. We find that pre-training on this data consistently outperforms training from scratch, leads to 2x improvements in the block stacking task, and has favorable scaling properties.

* Project page: https://robotic-pretrained-transformer.github.io

Via

Access Paper or Ask Questions

Video Prediction Models as Rewards for Reinforcement Learning

May 23, 2023

Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

Figure 1 for Video Prediction Models as Rewards for Reinforcement Learning

Figure 2 for Video Prediction Models as Rewards for Reinforcement Learning

Figure 3 for Video Prediction Models as Rewards for Reinforcement Learning

Figure 4 for Video Prediction Models as Rewards for Reinforcement Learning

Abstract:Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning. Specifically, we first train an autoregressive transformer on expert videos and then use the video prediction likelihoods as reward signals for a reinforcement learning agent. VIPER enables expert-level control without programmatic task rewards across a wide range of DMC, Atari, and RLBench tasks. Moreover, generalization of the video prediction model allows us to derive rewards for an out-of-distribution environment where no expert data is available, enabling cross-embodiment generalization for tabletop manipulation. We see our work as starting point for scalable reward specification from unlabeled videos that will benefit from the rapid advances in generative modeling. Source code and datasets are available on the project website: https://escontrela.me

* 20 pages, 15 figures, 4 tables. under review

Via

Access Paper or Ask Questions

More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

May 02, 2023

Huang Huang, Antonio Loquercio, Ashish Kumar, Neerja Thakkar, Ken Goldberg, Jitendra Malik

Figure 1 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Figure 2 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Figure 3 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Figure 4 for More Than an Arm: Using a Manipulator as a Tail for Enhanced Stability in Legged Locomotion

Abstract:Is a manipulator on a legged robot a liability or an asset for locomotion? Prior works mainly designed specific controllers to account for the added payload and inertia from a manipulator. In contrast, biological systems typically benefit from additional limbs, which can simplify postural control. For instance, cats use their tails to enhance the stability of their bodies and prevent falls under disturbances. In this work, we show that a manipulator can be an important asset for maintaining balance during locomotion. To do so, we train a sensorimotor policy using deep reinforcement learning to create a synergy between the robot's limbs. This policy enables the robot to maintain stability despite large disturbances. However, learning such a controller can be quite challenging. To account for these challenges, we propose a stage-wise training procedure to learn complex behaviors. Our proposed method decomposes this complex task into three stages and then incrementally learns these tasks to arrive at a single policy capable of solving the final control task, achieving a success rate up to 2.35 times higher than baselines in simulation. We deploy our learned policy in the real world and show stability during locomotion under strong disturbances.

Via

Access Paper or Ask Questions

Bagging by Learning to Singulate Layers Using Interactive Perception

Mar 29, 2023

Lawrence Yunliang Chen, Baiyu Shi, Roy Lin, Daniel Seita, Ayah Ahmad, Richard Cheng, Thomas Kollar, David Held, Ken Goldberg

Abstract:Many fabric handling and 2D deformable material tasks in homes and industry require singulating layers of material such as opening a bag or arranging garments for sewing. In contrast to methods requiring specialized sensing or end effectors, we use only visual observations with ordinary parallel jaw grippers. We propose SLIP: Singulating Layers using Interactive Perception, and apply SLIP to the task of autonomous bagging. We develop SLIP-Bagging, a bagging algorithm that manipulates a plastic or fabric bag from an unstructured state, and uses SLIP to grasp the top layer of the bag to open it for object insertion. In physical experiments, a YuMi robot achieves a success rate of 67% to 81% across bags of a variety of materials, shapes, and sizes, significantly improving in success rate and generality over prior work. Experiments also suggest that SLIP can be applied to tasks such as singulating layers of folded cloth and garments. Supplementary material is available at https://sites.google.com/view/slip-bagging/.

Via

Access Paper or Ask Questions

LERF: Language Embedded Radiance Fields

Mar 16, 2023

Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, Matthew Tancik

Abstract:Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or actionable affordances. In this work we propose Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enable these types of open-ended language queries in 3D. LERF learns a dense, multi-scale language field inside NeRF by volume rendering CLIP embeddings along training rays, supervising these embeddings across training views to provide multi-view consistency and smooth the underlying language field. After optimization, LERF can extract 3D relevancy maps for a broad range of language prompts interactively in real-time, which has potential use cases in robotics, understanding vision-language models, and interacting with 3D scenes. LERF enables pixel-aligned, zero-shot queries on the distilled 3D CLIP embeddings without relying on region proposals or masks, supporting long-tail open-vocabulary queries hierarchically across the volume. The project website can be found at https://lerf.io .

* Project website can be found at https://lerf.io

Via

Access Paper or Ask Questions

Learning to Trace and Untangle Semi-planar Knots

Mar 15, 2023

Vainavi Viswanath, Kaushik Shivakumar, Jainil Ajmera, Mallika Parulekar, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg

Figure 1 for Learning to Trace and Untangle Semi-planar Knots

Figure 2 for Learning to Trace and Untangle Semi-planar Knots

Figure 3 for Learning to Trace and Untangle Semi-planar Knots

Figure 4 for Learning to Trace and Untangle Semi-planar Knots

Abstract:This paper extends prior work on untangling long cables and presents TUSK (Tracing to Untangle Semi-planar Knots), a learned cable-tracing algorithm that resolves over-crossings and undercrossings to recognize the structure of knots and grasp points for untangling from a single RGB image. This work focuses on semi-planar knots, which are knots composed of crossings that each include at most 2 cable segments. We conduct experiments on long cables (3 m in length) with up to 15 semi-planar crossings across 6 different knot types. Crops of crossings from 3 knots (overhand, figure 8, and bowline) of the 6 are seen during training, but none of the full knots are seen during training. This is an improvement from prior work on long cables that can only untangle 2 knot types. Experiments find that in settings with multiple identical cables, TUSK can trace a single cable with 81% accuracy on 7 new knot types. In single-cable images, TUSK can trace and identify the correct knot with 77% success on 3 new knot types. We incorporate TUSK into a bimanual robot system and find that it successfully untangles 64% of cable configurations, including those with new knots unseen during training, across 3 levels of difficulty. Supplementary material, including an annotated dataset of 500 RGB-D images of a knotted cable along with ground-truth traces, can be found at https://sites.google.com/view/tusk-rss.

Via

Access Paper or Ask Questions

From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models

Feb 24, 2023

Satvik Sharma, Kaushik Shivakumar, Huang Huang, Ryan Hoque, Alishba Imran, Brian Ichter, Ken Goldberg

Figure 1 for From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models

Figure 2 for From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models

Figure 3 for From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models

Figure 4 for From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models

Abstract:How can a robot efficiently extract a desired object from a shelf when it is fully occluded by other objects? Prior works propose geometric approaches for this problem but do not consider object semantics. Shelves in pharmacies, restaurant kitchens, and grocery stores are often organized such that semantically similar objects are placed close to one another. Can large language models (LLMs) serve as semantic knowledge sources to accelerate robotic mechanical search in semantically arranged environments? With Semantic Spatial Search on Shelves (S^4), we use LLMs to generate affinity matrices, where entries correspond to semantic likelihood of physical proximity between objects. We derive semantic spatial distributions by synthesizing semantics with learned geometric constraints. S^4 incorporates Optical Character Recognition (OCR) and semantic refinement with predictions from ViLD, an open-vocabulary object detection model. Simulation experiments suggest that semantic spatial search reduces the search time relative to pure spatial search by an average of 24% across three domains: pharmacy, kitchen, and office shelves. A manually collected dataset of 100 semantic scenes suggests that OCR and semantic refinement improve object detection accuracy by 35%. Lastly, physical experiments in a pharmacy shelf suggest 47.1% improvement over pure spatial search. Supplementary material can be found at https://sites.google.com/view/s4-rss/home.

Via

Access Paper or Ask Questions

Automating Vascular Shunt Insertion with the dVRK Surgical Robot

Nov 04, 2022

Karthik Dharmarajan, Will Panitch, Muyan Jiang, Kishore Srinivas, Baiyu Shi, Yahav Avigal, Huang Huang, Thomas Low, Danyal Fer, Ken Goldberg

Abstract:Vascular shunt insertion is a fundamental surgical procedure used to temporarily restore blood flow to tissues. It is often performed in the field after major trauma. We formulate a problem of automated vascular shunt insertion and propose a pipeline to perform Automated Vascular Shunt Insertion (AVSI) using a da Vinci Research Kit. The pipeline uses a learned visual model to estimate the locus of the vessel rim, plans a grasp on the rim, and moves to grasp at that point. The first robot gripper then pulls the rim to stretch open the vessel with a dilation motion. The second robot gripper then proceeds to insert a shunt into the vessel phantom (a model of the blood vessel) with a chamfer tilt followed by a screw motion. Results suggest that AVSI achieves a high success rate even with tight tolerances and varying vessel orientations up to 30{\deg}. Supplementary material, dataset, videos, and visualizations can be found at https://sites.google.com/berkeley.edu/autolab-avsi.

Via

Access Paper or Ask Questions

AutoBag: Learning to Open Plastic Bags and Insert Objects

Oct 31, 2022

Lawrence Yunliang Chen, Baiyu Shi, Daniel Seita, Richard Cheng, Thomas Kollar, David Held, Ken Goldberg

Figure 1 for AutoBag: Learning to Open Plastic Bags and Insert Objects

Figure 2 for AutoBag: Learning to Open Plastic Bags and Insert Objects

Figure 3 for AutoBag: Learning to Open Plastic Bags and Insert Objects

Figure 4 for AutoBag: Learning to Open Plastic Bags and Insert Objects

Abstract:Thin plastic bags are ubiquitous in retail stores, healthcare, food handling, recycling, homes, and school lunchrooms. They are challenging both for perception (due to specularities and occlusions) and for manipulation (due to the dynamics of their 3D deformable structure). We formulate the task of manipulating common plastic shopping bags with two handles from an unstructured initial state to a state where solid objects can be inserted into the bag for transport. We propose a self-supervised learning framework where a dual-arm robot learns to recognize the handles and rim of plastic bags using UV-fluorescent markings; at execution time, the robot does not use UV markings or UV light. We propose Autonomous Bagging (AutoBag), where the robot uses the learned perception model to open plastic bags through iterative manipulation. We present novel metrics to evaluate the quality of a bag state and new motion primitives for reorienting and opening bags from visual observations. In physical experiments, a YuMi robot using AutoBag is able to open bags and achieve a success rate of 16/30 for inserting at least one item across a variety of initial bag configurations. Supplementary material is available at https://sites.google.com/view/autobag .

Via

Access Paper or Ask Questions