Alert button
Picture for David Held

David Held

Alert button

FlowBot++: Learning Generalized Articulated Objects Manipulation via Articulation Projection

Jun 22, 2023
Harry Zhang, Ben Eisner, David Held

Understanding and manipulating articulated objects, such as doors and drawers, is crucial for robots operating in human environments. We wish to develop a system that can learn to articulate novel objects with no prior interaction, after training on other articulated objects. Previous approaches for articulated object manipulation rely on either modular methods which are brittle or end-to-end methods, which lack generalizability. This paper presents FlowBot++, a deep 3D vision-based robotic system that predicts dense per-point motion and dense articulation parameters of articulated objects to assist in downstream manipulation tasks. FlowBot++ introduces a novel per-point representation of the articulated motion and articulation parameters that are combined to produce a more accurate estimate than either method on their own. Simulated experiments on the PartNet-Mobility dataset validate the performance of our system in articulating a wide range of objects, while real-world experiments on real objects' point clouds and a Sawyer robot demonstrate the generalizability and feasibility of our system in real-world scenarios.

* arXiv admin note: text overlap with arXiv:2205.04382 
Viaarxiv icon

One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments

Jun 21, 2023
Yufei Wang, Zhanyi Sun, Zackory Erickson, David Held

Figure 1 for One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments
Figure 2 for One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments
Figure 3 for One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments
Figure 4 for One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments

Robot-assisted dressing could benefit the lives of many people such as older adults and individuals with disabilities. Despite such potential, robot-assisted dressing remains a challenging task for robotics as it involves complex manipulation of deformable cloth in 3D space. Many prior works aim to solve the robot-assisted dressing task, but they make certain assumptions such as a fixed garment and a fixed arm pose that limit their ability to generalize. In this work, we develop a robot-assisted dressing system that is able to dress different garments on people with diverse poses from partial point cloud observations, based on a learned policy. We show that with proper design of the policy architecture and Q function, reinforcement learning (RL) can be used to learn effective policies with partial point cloud observations that work well for dressing diverse garments. We further leverage policy distillation to combine multiple policies trained on different ranges of human arm poses into a single policy that works over a wide range of different arm poses. We conduct comprehensive real-world evaluations of our system with 510 dressing trials in a human study with 17 participants with different arm poses and dressed garments. Our system is able to dress 86% of the length of the participants' arms on average. Videos can be found on our project webpage: https://sites.google.com/view/one-policy-dress.

* RSS 2023. Last two authors: equal advising 
Viaarxiv icon

Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation

May 06, 2023
Wenxuan Zhou, Bowen Jiang, Fan Yang, Chris Paxton, David Held

Figure 1 for Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation
Figure 2 for Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation
Figure 3 for Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation
Figure 4 for Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation

Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about the interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial pose, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving a 79% success rate on non-flat objects. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io .

Viaarxiv icon

Bagging by Learning to Singulate Layers Using Interactive Perception

Mar 29, 2023
Lawrence Yunliang Chen, Baiyu Shi, Roy Lin, Daniel Seita, Ayah Ahmad, Richard Cheng, Thomas Kollar, David Held, Ken Goldberg

Figure 1 for Bagging by Learning to Singulate Layers Using Interactive Perception
Figure 2 for Bagging by Learning to Singulate Layers Using Interactive Perception
Figure 3 for Bagging by Learning to Singulate Layers Using Interactive Perception
Figure 4 for Bagging by Learning to Singulate Layers Using Interactive Perception

Many fabric handling and 2D deformable material tasks in homes and industry require singulating layers of material such as opening a bag or arranging garments for sewing. In contrast to methods requiring specialized sensing or end effectors, we use only visual observations with ordinary parallel jaw grippers. We propose SLIP: Singulating Layers using Interactive Perception, and apply SLIP to the task of autonomous bagging. We develop SLIP-Bagging, a bagging algorithm that manipulates a plastic or fabric bag from an unstructured state, and uses SLIP to grasp the top layer of the bag to open it for object insertion. In physical experiments, a YuMi robot achieves a success rate of 67% to 81% across bags of a variety of materials, shapes, and sizes, significantly improving in success rate and generality over prior work. Experiments also suggest that SLIP can be applied to tasks such as singulating layers of folded cloth and garments. Supplementary material is available at https://sites.google.com/view/slip-bagging/.

Viaarxiv icon

Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

Mar 01, 2023
Tarasha Khurana, Peiyun Hu, David Held, Deva Ramanan

Figure 1 for Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
Figure 2 for Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
Figure 3 for Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
Figure 4 for Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular LiDAR sensor), and (3) the shape and motion of other objects in the scene. But autonomous systems should make predictions about the world and not their sensors. To this end, we factor out (1) and (2) by recasting the task as one of spacetime (4D) occupancy forecasting. But because it is expensive to obtain ground-truth 4D occupancy, we render point cloud data from 4D occupancy predictions given sensor extrinsics and intrinsics, allowing one to train and test occupancy algorithms with unannotated LiDAR sequences. This also allows one to evaluate and compare point cloud forecasting algorithms across diverse datasets, sensors, and vehicles.

* CVPR 2023. Project page: https://www.cs.cmu.edu/~tkhurana/ff4d/index.html; Code: https://github.com/tarashakhurana/4d-occ-forecasting 
Viaarxiv icon

Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits

Feb 28, 2023
Siddharth Ancha, Gaurav Pathak, Ji Zhang, Srinivasa Narasimhan, David Held

Figure 1 for Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits
Figure 2 for Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits
Figure 3 for Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits
Figure 4 for Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits

To navigate in an environment safely and autonomously, robots must accurately estimate where obstacles are and how they move. Instead of using expensive traditional 3D sensors, we explore the use of a much cheaper, faster, and higher resolution alternative: programmable light curtains. Light curtains are a controllable depth sensor that sense only along a surface that the user selects. We adapt a probabilistic method based on particle filters and occupancy grids to explicitly estimate the position and velocity of 3D points in the scene using partial measurements made by light curtains. The central challenge is to decide where to place the light curtain to accurately perform this task. We propose multiple curtain placement strategies guided by maximizing information gain and verifying predicted object locations. Then, we combine these strategies using an online learning framework. We propose a novel self-supervised reward function that evaluates the accuracy of current velocity estimates using future light curtain placements. We use a multi-armed bandit framework to intelligently switch between placement policies in real time, outperforming fixed policies. We develop a full-stack navigation system that uses position and velocity estimates from light curtains for downstream tasks such as localization, mapping, path-planning, and obstacle avoidance. This work paves the way for controllable light curtains to accurately, efficiently, and purposefully perceive and navigate complex and dynamic environments. Project website: https://siddancha.github.io/projects/active-velocity-estimation/

* 9 pages (main paper), 3 pages (references), 9 pages (appendix) 
Viaarxiv icon

Self-supervised Cloth Reconstruction via Action-conditioned Cloth Tracking

Feb 19, 2023
Zixuan Huang, Xingyu Lin, David Held

Figure 1 for Self-supervised Cloth Reconstruction via Action-conditioned Cloth Tracking
Figure 2 for Self-supervised Cloth Reconstruction via Action-conditioned Cloth Tracking
Figure 3 for Self-supervised Cloth Reconstruction via Action-conditioned Cloth Tracking
Figure 4 for Self-supervised Cloth Reconstruction via Action-conditioned Cloth Tracking

State estimation is one of the greatest challenges for cloth manipulation due to cloth's high dimensionality and self-occlusion. Prior works propose to identify the full state of crumpled clothes by training a mesh reconstruction model in simulation. However, such models are prone to suffer from a sim-to-real gap due to differences between cloth simulation and the real world. In this work, we propose a self-supervised method to finetune a mesh reconstruction model in the real world. Since the full mesh of crumpled cloth is difficult to obtain in the real world, we design a special data collection scheme and an action-conditioned model-based cloth tracking method to generate pseudo-labels for self-supervised learning. By finetuning the pretrained mesh reconstruction model on this pseudo-labeled dataset, we show that we can improve the quality of the reconstructed mesh without requiring human annotations, and improve the performance of downstream manipulation task.

* International Conference on Robotics and Automation 2023  
Viaarxiv icon

Deep Projective Rotation Estimation through Relative Supervision

Nov 21, 2022
Brian Okorn, Chuer Pan, Martial Hebert, David Held

Figure 1 for Deep Projective Rotation Estimation through Relative Supervision
Figure 2 for Deep Projective Rotation Estimation through Relative Supervision
Figure 3 for Deep Projective Rotation Estimation through Relative Supervision
Figure 4 for Deep Projective Rotation Estimation through Relative Supervision

Orientation estimation is the core to a variety of vision and robotics tasks such as camera and object pose estimation. Deep learning has offered a way to develop image-based orientation estimators; however, such estimators often require training on a large labeled dataset, which can be time-intensive to collect. In this work, we explore whether self-supervised learning from unlabeled data can be used to alleviate this issue. Specifically, we assume access to estimates of the relative orientation between neighboring poses, such that can be obtained via a local alignment method. While self-supervised learning has been used successfully for translational object keypoints, in this work, we show that naively applying relative supervision to the rotational group $SO(3)$ will often fail to converge due to the non-convexity of the rotational space. To tackle this challenge, we propose a new algorithm for self-supervised orientation estimation which utilizes Modified Rodrigues Parameters to stereographically project the closed manifold of $SO(3)$ to the open manifold of $\mathbb{R}^{3}$, allowing the optimization to be done in an open Euclidean space. We empirically validate the benefits of the proposed algorithm for rotational averaging problem in two settings: (1) direct optimization on rotation parameters, and (2) optimization of parameters of a convolutional neural network that predicts object orientations from images. In both settings, we demonstrate that our proposed algorithm is able to converge to a consistent relative orientation frame much faster than algorithms that purely operate in the $SO(3)$ space. Additional information can be found at https://sites.google.com/view/deep-projective-rotation/home .

* Conference on Robot Learning (CoRL), 2022. Supplementary material is available at https://sites.google.com/view/deep-projective-rotation/home 
Viaarxiv icon

TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation

Nov 17, 2022
Chuer Pan, Brian Okorn, Harry Zhang, Ben Eisner, David Held

Figure 1 for TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Figure 2 for TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Figure 3 for TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation
Figure 4 for TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation

How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship ``cross-pose" and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks. Supplementary information and videos can be found at https://sites.google.com/view/tax-pose/home.

* Conference on Robot Learning (CoRL), 2022. Supplementary material is available at https://sites.google.com/view/tax-pose/home 
Viaarxiv icon