Alert button
Picture for Hongjie Fang

Hongjie Fang

Alert button

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Oct 17, 2023
Open X-Embodiment Collaboration, Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anikait Singh, Anthony Brohan, Antonin Raffin, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Brian Ichter, Cewu Lu, Charles Xu, Chelsea Finn, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Chuer Pan, Chuyuan Fu, Coline Devin, Danny Driess, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Federico Ceola, Fei Xia, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Giulio Schiavi, Gregory Kahn, Hao Su, Hao-Shu Fang, Haochen Shi, Heni Ben Amor, Henrik I Christensen, Hiroki Furuta, Homer Walke, Hongjie Fang, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jad Abou-Chakra, Jaehyung Kim, Jan Peters, Jan Schneider, Jasmine Hsu, Jeannette Bohg, Jeffrey Bingham, Jiajun Wu, Jialin Wu, Jianlan Luo, Jiayuan Gu, Jie Tan, Jihoon Oh, Jitendra Malik, Jonathan Tompson, Jonathan Yang, Joseph J. Lim, João Silvério, Junhyek Han, Kanishka Rao, Karl Pertsch, Karol Hausman, Keegan Go, Keerthana Gopalakrishnan, Ken Goldberg, Kendra Byrne, Kenneth Oslund, Kento Kawaharazuka, Kevin Zhang, Krishan Rana, Krishnan Srinivasan, Lawrence Yunliang Chen, Lerrel Pinto, Liam Tan, Lionel Ott, Lisa Lee, Masayoshi Tomizuka, Maximilian Du, Michael Ahn, Mingtong Zhang, Mingyu Ding, Mohan Kumar Srirama, Mohit Sharma, Moo Jin Kim, Naoaki Kanazawa, Nicklas Hansen, Nicolas Heess, Nikhil J Joshi, Niko Suenderhauf, Norman Di Palo, Nur Muhammad Mahi Shafiullah, Oier Mees, Oliver Kroemer, Pannag R Sanketi, Paul Wohlhart, Peng Xu, Pierre Sermanet, Priya Sundaresan, Quan Vuong, Rafael Rafailov, Ran Tian, Ria Doshi, Roberto Martín-Martín, Russell Mendonca, Rutav Shah, Ryan Hoque, Ryan Julian, Samuel Bustamante, Sean Kirmani, Sergey Levine, Sherry Moore, Shikhar Bahl, Shivin Dass, Shubham Sonawani, Shuran Song, Sichun Xu, Siddhant Haldar, Simeon Adebola, Simon Guist, Soroush Nasiriany, Stefan Schaal, Stefan Welker, Stephen Tian, Sudeep Dasari, Suneel Belkhale, Takayuki Osa, Tatsuya Harada, Tatsuya Matsushima, Ted Xiao, Tianhe Yu, Tianli Ding, Todor Davchev, Tony Z. Zhao, Travis Armstrong, Trevor Darrell, Vidhi Jain, Vincent Vanhoucke, Wei Zhan, Wenxuan Zhou, Wolfram Burgard, Xi Chen, Xiaolong Wang, Xinghao Zhu, Xuanlin Li, Yao Lu, Yevgen Chebotar, Yifan Zhou, Yifeng Zhu, Ying Xu, Yixuan Wang, Yonatan Bisk, Yoonyoung Cho, Youngwoon Lee, Yuchen Cui, Yueh-Hua Wu, Yujin Tang, Yuke Zhu, Yunzhu Li, Yusuke Iwasawa, Yutaka Matsuo, Zhuo Xu, Zichen Jeff Cui

Figure 1 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 2 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 3 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 4 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website $\href{https://robotics-transformer-x.github.io}{\text{robotics-transformer-x.github.io}}$.

Viaarxiv icon

Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild

Sep 26, 2023
Hongjie Fang, Hao-Shu Fang, Yiming Wang, Jieji Ren, Jingjing Chen, Ruo Zhang, Weiming Wang, Cewu Lu

Figure 1 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Figure 2 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Figure 3 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Figure 4 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild

While humans can use parts of their arms other than the hands for manipulations like gathering and supporting, whether robots can effectively learn and perform the same type of operations remains relatively unexplored. As these manipulations require joint-level control to regulate the complete poses of the robots, we develop AirExo, a low-cost, adaptable, and portable dual-arm exoskeleton, for teleoperation and demonstration collection. As collecting teleoperated data is expensive and time-consuming, we further leverage AirExo to collect cheap in-the-wild demonstrations at scale. Under our in-the-wild learning framework, we show that with only 3 minutes of the teleoperated demonstrations, augmented by diverse and extensive in-the-wild data collected by AirExo, robots can learn a policy that is comparable to or even better than one learned from teleoperated demonstrations lasting over 20 minutes. Experiments demonstrate that our approach enables the model to learn a more general and robust policy across the various stages of the task, enhancing the success rates in task completion even with the presence of disturbances. Project website: https://airexo.github.io/

* Project page: https://airexo.github.io/ 
Viaarxiv icon

Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation

Aug 29, 2023
Gu Zhang, Hao-Shu Fang, Hongjie Fang, Cewu Lu

Figure 1 for Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation
Figure 2 for Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation
Figure 3 for Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation
Figure 4 for Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation

In recent years, there has been a significant effort dedicated to developing efficient, robust, and general human-to-robot handover systems. However, the area of flexible handover in the context of complex and continuous objects' motion remains relatively unexplored. In this work, we propose an approach for effective and robust flexible handover, which enables the robot to grasp moving objects with flexible motion trajectories with a high success rate. The key innovation of our approach is the generation of real-time robust grasp trajectories. We also design a future grasp prediction algorithm to enhance the system's adaptability to dynamic handover scenes. We conduct one-motion handover experiments and motion-continuous handover experiments on our novel benchmark that includes 31 diverse household objects. The system we have developed allows users to move and rotate objects in their hands within a relatively large range. The success rate of the robot grasping such moving objects is 78.15% over the entire household object benchmark.

* Paper accepted by IROS2023 
Viaarxiv icon

RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

Jul 02, 2023
Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Junbo Wang, Haoyi Zhu, Cewu Lu

Figure 1 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
Figure 2 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
Figure 3 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot
Figure 4 for RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improving task and motion planning. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 \emph{contact-rich} robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected \emph{in the real world}. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. The dataset is made publicly available at rh20t.github.io

* RSS 2023 workshop on LTAMP. The project page is at rh20t.github.io 
Viaarxiv icon

AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains

Dec 16, 2022
Hao-Shu Fang, Chenxi Wang, Hongjie Fang, Minghao Gou, Jirong Liu, Hengxu Yan, Wenhai Liu, Yichen Xie, Cewu Lu

Figure 1 for AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains
Figure 2 for AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains
Figure 3 for AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains
Figure 4 for AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains

As the basis for prehensile manipulation, it is vital to enable robots to grasp as robustly as humans. In daily manipulation, our grasping system is prompt, accurate, flexible and continuous across spatial and temporal domains. Few existing methods cover all these properties for robot grasping. In this paper, we propose a new methodology for grasp perception to enable robots these abilities. Specifically, we develop a dense supervision strategy with real perception and analytic labels in the spatial-temporal domain. Additional awareness of objects' center-of-mass is incorporated into the learning process to help improve grasping stability. Utilization of grasp correspondence across observations enables dynamic grasp tracking. Our model, AnyGrasp, can generate accurate, full-DoF, dense and temporally-smooth grasp poses efficiently, and works robustly against large depth sensing noise. Embedded with AnyGrasp, we achieve a 93.3% success rate when clearing bins with over 300 unseen objects, which is comparable with human subjects under controlled conditions. Over 900 MPPH is reported on a single-arm system. For dynamic grasping, we demonstrate catching swimming robot fish in the water.

* project page at https://graspnet.net/anygrasp.html 
Viaarxiv icon

TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping

Feb 17, 2022
Hongjie Fang, Hao-Shu Fang, Sheng Xu, Cewu Lu

Figure 1 for TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping
Figure 2 for TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping
Figure 3 for TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping
Figure 4 for TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping

Transparent objects are common in our daily life and frequently handled in the automated production line. Robust vision-based robotic grasping and manipulation for these objects would be beneficial for automation. However, the majority of current grasping algorithms would fail in this case since they heavily rely on the depth image, while ordinary depth sensors usually fail to produce accurate depth information for transparent objects owing to the reflection and refraction of light. In this work, we address this issue by contributing a large-scale real-world dataset for transparent object depth completion, which contains 57,715 RGB-D images from 130 different scenes. Our dataset is the first large-scale real-world dataset and provides the most comprehensive annotation. Cross-domain experiments show that our dataset has a great generalization ability. Moreover, we propose an end-to-end depth completion network, which takes the RGB image and the inaccurate depth map as inputs and outputs a refined depth map. Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources. Real robot experiment shows that our method can also be applied to novel object grasping robustly. The full dataset and our method are publicly available at www.graspnet.net/transcg.

* project page: www.graspnet.net/transcg 
Viaarxiv icon