Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ken Goldberg

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Oct 17, 2023
Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Chuer Pan, Chuyuan Fu, Coline Devin, Danny Driess, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Federico Ceola, Fei Xia, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Giulio Schiavi, Gregory Kahn, Hao Su, Hao-Shu Fang, Haochen Shi, Heni Ben Amor, Henrik I Christensen, Hiroki Furuta, Homer Walke, Hongjie Fang, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jad Abou-Chakra, Jaehyung Kim, Jan Peters, Jan Schneider, Jasmine Hsu, Jeannette Bohg, Jeffrey Bingham, Jiajun Wu, Jialin Wu, Jianlan Luo, Jiayuan Gu, Jie Tan, Jihoon Oh, Jitendra Malik, Jonathan Tompson, Jonathan Yang, Joseph J. Lim, João Silvério, Junhyek Han, Kanishka Rao, Karl Pertsch, Karol Hausman, Keegan Go, Keerthana Gopalakrishnan, Ken Goldberg, Kendra Byrne, Kenneth Oslund, Kento Kawaharazuka, Kevin Zhang, Krishan Rana, Krishnan Srinivasan, Lawrence Yunliang Chen, Lerrel Pinto, Liam Tan, Lionel Ott, Lisa Lee, Masayoshi Tomizuka, Maximilian Du, Michael Ahn, Mingtong Zhang, Mingyu Ding, Mohan Kumar Srirama, Mohit Sharma, Moo Jin Kim, Naoaki Kanazawa, Nicklas Hansen, Nicolas Heess, Nikhil J Joshi, Niko Suenderhauf, Norman Di Palo, Nur Muhammad Mahi Shafiullah, Oier Mees, Oliver Kroemer, Pannag R Sanketi, Paul Wohlhart, Peng Xu, Pierre Sermanet, Priya Sundaresan, Quan Vuong, Rafael Rafailov, Ran Tian, Ria Doshi, Roberto Martín-Martín, Russell Mendonca, Rutav Shah, Ryan Hoque, Ryan Julian, Samuel Bustamante, Sean Kirmani, Sergey Levine, Sherry Moore, Shikhar Bahl, Shivin Dass, Shubham Sonawani, Shuran Song, Sichun Xu, Siddhant Haldar, Simeon Adebola, Simon Guist, Soroush Nasiriany, Stefan Schaal, Stefan Welker, Stephen Tian, Sudeep Dasari, Suneel Belkhale, Takayuki Osa, Tatsuya Harada, Tatsuya Matsushima, Ted Xiao, Tianhe Yu, Tianli Ding, Todor Davchev, Tony Z. Zhao, Travis Armstrong, Trevor Darrell, Vidhi Jain, Vincent Vanhoucke, Wei Zhan, Wenxuan Zhou, Wolfram Burgard, Xi Chen, Xiaolong Wang, Xinghao Zhu, Xuanlin Li, Yao Lu, Yevgen Chebotar, Yifan Zhou, Yifeng Zhu, Ying Xu, Yixuan Wang, Yonatan Bisk, Yoonyoung Cho, Youngwoon Lee, Yuchen Cui, Yueh-Hua Wu, Yujin Tang, Yuke Zhu, Yunzhu Li, Yusuke Iwasawa, Yutaka Matsuo, Zhuo Xu, Zichen Jeff Cui

Figure 1 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Figure 2 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Figure 3 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Figure 4 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website $\href{https://robotics-transformer-x.github.io}{\text{robotics-transformer-x.github.io}}$.

Via

Access Paper or Ask Questions

Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Sep 18, 2023
Adam Rashid, Satvik Sharma, Chung Min Kim, Justin Kerr, Lawrence Chen, Angjoo Kanazawa, Ken Goldberg

Figure 1 for Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Figure 2 for Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Figure 3 for Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Figure 4 for Language Embedded Radiance Fields for Zero-Shot Task-Oriented Grasping

Grasping objects by a specific part is often crucial for safety and for executing downstream tasks. Yet, learning-based grasp planners lack this behavior unless they are trained on specific object part data, making it a significant challenge to scale object diversity. Instead, we propose LERF-TOGO, Language Embedded Radiance Fields for Task-Oriented Grasping of Objects, which uses vision-language models zero-shot to output a grasp distribution over an object given a natural language query. To accomplish this, we first reconstruct a LERF of the scene, which distills CLIP embeddings into a multi-scale 3D language field queryable with text. However, LERF has no sense of objectness, meaning its relevancy outputs often return incomplete activations over an object which are insufficient for subsequent part queries. LERF-TOGO mitigates this lack of spatial grouping by extracting a 3D object mask via DINO features and then conditionally querying LERF on this mask to obtain a semantic distribution over the object with which to rank grasps from an off-the-shelf grasp planner. We evaluate LERF-TOGO's ability to grasp task-oriented object parts on 31 different physical objects, and find it selects grasps on the correct part in 81% of all trials and grasps successfully in 69%. See the project website at: lerftogo.github.io

* See the project website at: lerftogo.github.io

Via

Access Paper or Ask Questions

Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Jul 13, 2023
Vincent Schorp, Will Panitch, Kaushik Shivakumar, Vainavi Viswanath, Justin Kerr, Yahav Avigal, Danyal M Fer, Lionel Ott, Ken Goldberg

Figure 1 for Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Figure 2 for Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Figure 3 for Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Figure 4 for Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening

Accurate 3D sensing of suturing thread is a challenging problem in automated surgical suturing because of the high state-space complexity, thinness and deformability of the thread, and possibility of occlusion by the grippers and tissue. In this work we present a method for tracking surgical thread in 3D which is robust to occlusions and complex thread configurations, and apply it to autonomously perform the surgical suture "tail-shortening" task: pulling thread through tissue until a desired "tail" length remains exposed. The method utilizes a learned 2D surgical thread detection network to segment suturing thread in RGB images. It then identifies the thread path in 2D and reconstructs the thread in 3D as a NURBS spline by triangulating the detections from two stereo cameras. Once a 3D thread model is initialized, the method tracks the thread across subsequent frames. Experiments suggest the method achieves a 1.33 pixel average reprojection error on challenging single-frame 3D thread reconstructions, and an 0.84 pixel average reprojection error on two tracking sequences. On the tail-shortening task, it accomplishes a 90% success rate across 20 trials. Supplemental materials are available at https://sites.google.com/berkeley.edu/autolab-surgical-thread/ .

* International Conference on Automation Science and Engineering (CASE) 2023, 7 pages

Via

Access Paper or Ask Questions

The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps

Jul 08, 2023
Kishore Srinivas, Shreya Ganti, Rishi Parikh, Ayah Ahmad, Wisdom Agboh, Mehmet Dogar, Ken Goldberg

Figure 1 for The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps

Figure 2 for The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps

Figure 3 for The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps

Figure 4 for The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps

We present the "Busboy Problem": automating an efficient decluttering of cups, bowls, and silverware from a planar surface. As grasping and transporting individual items is highly inefficient, we propose policies to generate grasps for multiple items. We introduce the metric of Objects per Trip (OpT) carried by the robot to the collection bin to analyze the improvement seen as a result of our policies. In physical experiments with singulated items, we find that consolidation and multi-object grasps resulted in an 1.8x improvement in OpT, compared to methods without multi-object grasps. See https://sites.google.com/berkeley.edu/busboyproblem for code and supplemental materials.

Via

Access Paper or Ask Questions

Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Jun 29, 2023
Simeon Adebola, Rishi Parikh, Mark Presten, Satvik Sharma, Shrey Aeron, Ananth Rao, Sandeep Mukherjee, Tomson Qu, Christina Wistrom, Eugen Solowjow, Ken Goldberg

Figure 1 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Figure 2 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Figure 3 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Figure 4 for Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

The AlphaGarden is an automated testbed for indoor polyculture farming which combines a first-order plant simulator, a gantry robot, a seed planting algorithm, plant phenotyping and tracking algorithms, irrigation sensors and algorithms, and custom pruning tools and algorithms. In this paper, we systematically compare the performance of the AlphaGarden to professional horticulturalists on the staff of the UC Berkeley Oxford Tract Greenhouse. The humans and the machine tend side-by-side polyculture gardens with the same seed arrangement. We compare performance in terms of canopy coverage, plant diversity, and water consumption. Results from two 60-day cycles suggest that the automated AlphaGarden performs comparably to professional horticulturalists in terms of coverage and diversity, and reduces water consumption by as much as 44%. Code, videos, and datasets are available at https://sites.google.com/berkeley.edu/systematiccomparison.

* International Conference on Robotics and Automation(ICRA) 2023 Oral

Via

Access Paper or Ask Questions

FogROS2-SGC: A ROS2 Cloud Robotics Platform for Secure Global Connectivity

Jun 29, 2023
Kaiyuan Chen, Ryan Hoque, Karthik Dharmarajan, Edith LLontop, Simeon Adebola, Jeffrey Ichnowski, John Kubiatowicz, Ken Goldberg

Figure 1 for FogROS2-SGC: A ROS2 Cloud Robotics Platform for Secure Global Connectivity

Figure 2 for FogROS2-SGC: A ROS2 Cloud Robotics Platform for Secure Global Connectivity

Figure 3 for FogROS2-SGC: A ROS2 Cloud Robotics Platform for Secure Global Connectivity

Figure 4 for FogROS2-SGC: A ROS2 Cloud Robotics Platform for Secure Global Connectivity

The Robot Operating System (ROS2) is the most widely used software platform for building robotics applications. FogROS2 extends ROS2 to allow robots to access cloud computing on demand. However, ROS2 and FogROS2 assume that all robots are locally connected and that each robot has full access and control of the other robots. With applications like distributed multi-robot systems, remote robot control, and mobile robots, robotics increasingly involves the global Internet and complex trust management. Existing approaches for connecting disjoint ROS2 networks lack key features such as security, compatibility, efficiency, and ease of use. We introduce FogROS2-SGC, an extension of FogROS2 that can effectively connect robot systems across different physical locations, networks, and Data Distribution Services (DDS). With globally unique and location-independent identifiers, FogROS2-SGC securely and efficiently routes data between robotics components around the globe. FogROS2-SGC is agnostic to the ROS2 distribution and configuration, is compatible with non-ROS2 software, and seamlessly extends existing ROS2 applications without any code modification. Experiments suggest FogROS2-SGC is 19x faster than rosbridge (a ROS2 package with comparable features, but lacking security). We also apply FogROS2-SGC to 4 robots and compute nodes that are 3600km apart. Videos and code are available on the project website https://sites.google.com/view/fogros2-sgc.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

Jun 27, 2023
Gaurav Datta, Ryan Hoque, Anrui Gu, Eugen Solowjow, Ken Goldberg

Figure 1 for IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

Figure 2 for IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

Figure 3 for IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

Figure 4 for IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors

Imitation learning has been applied to a range of robotic tasks, but can struggle when (1) robots encounter edge cases that are not represented in the training data (distribution shift) or (2) the human demonstrations are heterogeneous: taking different paths around an obstacle, for instance (multimodality). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human teleoperators during task execution and learn from them over time, but is not equipped to handle multimodality. Recent work proposes Implicit Behavior Cloning (IBC), which is able to represent multimodal demonstrations using energy-based models (EBMs). In this work, we propose addressing both multimodality and distribution shift with Implicit Interactive Fleet Learning (IIFL), the first extension of implicit policies to interactive imitation learning (including the single-robot, single-human setting). IIFL quantifies uncertainty using a novel application of Jeffreys divergence to EBMs. While IIFL is more computationally expensive than explicit methods, results suggest that IIFL achieves 4.5x higher return on human effort in simulation experiments and an 80% higher success rate in a physical block pushing task over (Explicit) IFL, IBC, and other baselines when human supervision is heterogeneous.

Via

Access Paper or Ask Questions

Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Jun 24, 2023
Shrey Aeron, Edith LLontop, Aviv Adler, Wisdom C. Agboh, Mehmet R Dogar, Ken Goldberg

Figure 1 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Figure 2 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Figure 3 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Figure 4 for Push-MOG: Efficient Pushing to Consolidate Polygonal Objects for Multi-Object Grasping

Recently, robots have seen rapidly increasing use in homes and warehouses to declutter by collecting objects from a planar surface and placing them into a container. While current techniques grasp objects individually, Multi-Object Grasping (MOG) can improve efficiency by increasing the average number of objects grasped per trip (OpT). However, grasping multiple objects requires the objects to be aligned and in close proximity. In this work, we propose Push-MOG, an algorithm that computes "fork pushing" actions using a parallel-jaw gripper to create graspable object clusters. In physical decluttering experiments, we find that Push-MOG enables multi-object grasps, increasing the average OpT by 34%. Code and videos will be available at https://sites.google.com/berkeley.edu/push-mog.

* 6 pages, 4 figures, CASE 2023

Via

Access Paper or Ask Questions