Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tucker Hermans

Point Cloud Models Improve Visual Robustness in Robotic Learners

Apr 29, 2024

Skand Peri, Iain Lee, Chanho Kim, Li Fuxin, Tucker Hermans, Stefan Lee

Figure 1 for Point Cloud Models Improve Visual Robustness in Robotic Learners

Figure 2 for Point Cloud Models Improve Visual Robustness in Robotic Learners

Figure 3 for Point Cloud Models Improve Visual Robustness in Robotic Learners

Figure 4 for Point Cloud Models Improve Visual Robustness in Robotic Learners

Abstract:Visual control policies can encounter significant performance degradation when visual conditions like lighting or camera position differ from those seen during training -- often exhibiting sharp declines in capability even for minor differences. In this work, we examine robustness to a suite of these types of visual changes for RGB-D and point cloud based visual control policies. To perform these experiments on both model-free and model-based reinforcement learners, we introduce a novel Point Cloud World Model (PCWM) and point cloud based control policies. Our experiments show that policies that explicitly encode point clouds are significantly more robust than their RGB-D counterparts. Further, we find our proposed PCWM significantly outperforms prior works in terms of sample efficiency during training. Taken together, these results suggest reasoning about the 3D scene through point clouds can improve performance, reduce learning time, and increase robustness for robotic learners. Project Webpage: https://pvskand.github.io/projects/PCWM

* Accepted at International Conference on Robotics and Automation, 2024

Via

Access Paper or Ask Questions

V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes

Mar 14, 2024

Herbert Wright, Weiming Zhi, Matthew Johnson-Roberson, Tucker Hermans

Figure 1 for V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes

Figure 2 for V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes

Figure 3 for V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes

Figure 4 for V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes

Abstract:The ability to construct concise scene representations from sensor input is central to the field of robotics. This paper addresses the problem of robustly creating a 3D representation of a tabletop scene from a segmented RGB-D image. These representations are then critical for a range of downstream manipulation tasks. Many previous attempts to tackle this problem do not capture accurate uncertainty, which is required to subsequently produce safe motion plans. In this paper, we cast the representation of 3D tabletop scenes as a multi-class classification problem. To tackle this, we introduce V-PRISM, a framework and method for robustly creating probabilistic 3D segmentation maps of tabletop scenes. Our maps contain both occupancy estimates, segmentation information, and principled uncertainty measures. We evaluate the robustness of our method in (1) procedurally generated scenes using open-source object datasets, and (2) real-world tabletop data collected from a depth camera. Our experiments show that our approach outperforms alternative continuous reconstruction approaches that do not explicitly reason about objects in a multi-class formulation.

Via

Access Paper or Ask Questions

Pick and Place Planning is Better than Pick Planning then Place Planning

Jan 29, 2024

Mohanraj Devendran Shanthi, Tucker Hermans

Abstract:Robotic pick and place stands at the heart of autonomous manipulation. When conducted in cluttered or complex environments robots must jointly reason about the selected grasp and desired placement locations to ensure success. While several works have examined this joint pick-and-place problem, none have fully leveraged recent learning-based approaches for multi-fingered grasp planning. We present a modular algorithm for joint pick and place planning that can make use of state of the art grasp classifiers for planning multi-fingered grasps for novel objects from partial view point clouds. We demonstrate our joint pick and place formulation with several costs associated with different placement tasks. Experiments on pick and place tasks with cluttered scenes using a physical robot show that our joint inference method is more successful than a sequential pick then place approach, while also achieving better placement configurations.

* 8 pages, 14 figures, IEEE RA-L

Via

Access Paper or Ask Questions

Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Sep 26, 2023

Yixuan Huang, Jialin Yuan, Chanho Kim, Pupul Pradhan, Bryan Chen, Li Fuxin, Tucker Hermans

Figure 1 for Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Figure 2 for Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Figure 3 for Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Figure 4 for Out of Sight, Still in Mind: Reasoning and Planning about Unobserved Objects with Video Tracking Enabled Memory Models

Abstract:Robots need to have a memory of previously observed, but currently occluded objects to work reliably in realistic environments. We investigate the problem of encoding object-oriented memory into a multi-object manipulation reasoning and planning framework. We propose DOOM and LOOM, which leverage transformer relational dynamics to encode the history of trajectories given partial-view point clouds and an object discovery and tracking engine. Our approaches can perform multiple challenging tasks including reasoning with occluded objects, novel objects appearance, and object reappearance. Throughout our extensive simulation and real-world experiments, we find that our approaches perform well in terms of different numbers of objects and different numbers of distractor actions. Furthermore, we show our approaches outperform an implicit memory baseline.

* Under review

Via

Access Paper or Ask Questions

DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Sep 25, 2023

Bao Thach, Tanner Watts, Shing-Hei Ho, Tucker Hermans, Alan Kuntz

Figure 1 for DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Figure 2 for DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Figure 3 for DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Figure 4 for DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

Abstract:Shape servoing, a robotic task dedicated to controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. An issue arises, however, with the reliance on the specification of a goal shape. This goal has been obtained either by a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. In this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. We demonstrate our method's effectiveness on various robotic tasks, both in simulation and on a physical robot. Notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. These results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical, real-world applications.

* Submitted to IEEE Conference on Robotics and Automation (ICRA) 2024. 8 pages, 11 figures

Via

Access Paper or Ask Questions

Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

May 18, 2023

Yixuan Huang, Nichols Crawford Taylor, Adam Conkey, Weiyu Liu, Tucker Hermans

Figure 1 for Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Figure 2 for Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Figure 3 for Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Figure 4 for Latent Space Planning for Multi-Object Manipulation with Environment-Aware Relational Classifiers

Abstract:Objects rarely sit in isolation in everyday human environments. If we want robots to operate and perform tasks in our human environments, they must understand how the objects they manipulate will interact with structural elements of the environment for all but the simplest of tasks. As such, we'd like our robots to reason about how multiple objects and environmental elements relate to one another and how those relations may change as the robot interacts with the world. We examine the problem of predicting inter-object and object-environment relations between previously unseen objects and novel environments purely from partial-view point clouds. Our approach enables robots to plan and execute sequences to complete multi-object manipulation tasks defined from logical relations. This removes the burden of providing explicit, continuous object states as goals to the robot. We explore several different neural network architectures for this task. We find the best performing model to be a novel transformer-based neural network that both predicts object-environment relations and learns a latent-space dynamics function. We achieve reliable sim-to-real transfer without any fine-tuning. Our experiments show that our model understands how changes in observed environmental geometry relate to semantic relations between objects. We show more videos on our website: https://sites.google.com/view/erelationaldynamics.

* Under review. arXiv admin note: text overlap with arXiv:2209.11943

Via

Access Paper or Ask Questions

DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

May 08, 2023

Bao Thach, Brian Y. Cho, Tucker Hermans, Alan Kuntz

Figure 1 for DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Figure 2 for DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Figure 3 for DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Figure 4 for DeformerNet: Learning Bimanual Manipulation of 3D Deformable Objects

Abstract:Applications in fields ranging from home care to warehouse fulfillment to surgical assistance require robots to reliably manipulate the shape of 3D deformable objects. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the manipulated object and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn a visual servo controller that computes the desired robot end-effector action to iteratively deform the object toward the target shape. We demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training. Crucially, using DeformerNet, the robot successfully accomplishes three surgical sub-tasks: retraction (moving tissue aside to access a site underneath it), tissue wrapping (a sub-task in procedures like aortic stent placements), and connecting two tubular pieces of tissue (a sub-task in anastomosis).

* Submitted to IEEE Transactions on Robotics (T-RO). 18 pages, 25 figures. arXiv admin note: substantial text overlap with arXiv:2110.04685

Via

Access Paper or Ask Questions

DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

Mar 28, 2023

Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, Dieter Fox

Figure 1 for DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

Figure 2 for DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

Figure 3 for DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

Figure 4 for DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

Abstract:Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp candidates for grasp planning typically requires accurate, but slow 3D finite element method (FEM) simulation. Sampling-based grasp planning is often impractical, as it requires evaluation of a large number of grasp candidates. Gradient-based grasp planning can be more efficient, but requires a differentiable model to synthesize optimal grasps from initial candidates. Differentiable FEM simulators may fill this role, but are typically no faster than standard FEM. In this work, we propose learning a predictive graph neural network (GNN), DefGraspNets, to act as our differentiable model. We train DefGraspNets to predict 3D stress and deformation fields based on FEM-based grasp simulations. DefGraspNets not only runs up to 1500 times faster than the FEM simulator, but also enables fast gradient-based grasp optimization over 3D stress and deformation metrics. We design DefGraspNets to align with real-world grasp planning practices and demonstrate generalization across multiple test sets, including real-world experiments.

* To be published in the IEEE Conference on Robotics and Automation (ICRA), 2023

Via

Access Paper or Ask Questions

Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Dec 16, 2022

Martin Matak, Tucker Hermans

Figure 1 for Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Figure 2 for Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Figure 3 for Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Figure 4 for Planning Visual-Tactile Precision Grasps via Complementary Use of Vision and Touch

Abstract:Reliably planning fingertip grasps for multi-fingered hands lies as a key challenge for many tasks including tool use, insertion, and dexterous in-hand manipulation. This task becomes even more difficult when the robot lacks an accurate model of the object to be grasped. Tactile sensing offers a promising approach to account for uncertainties in object shape. However, current robotic hands tend to lack full tactile coverage. As such, a problem arises of how to plan and execute grasps for multi-fingered hands such that contact is made with the area covered by the tactile sensors. To address this issue, we propose an approach to grasp planning that explicitly reasons about where the fingertips should contact the estimated object surface while maximizing the probability of grasp success. Key to our method's success is the use of visual surface estimation for initial planning to encode the contact constraint. The robot then executes this plan using a tactile-feedback controller that enables the robot to adapt to online estimates of the object's surface to correct for errors in the initial plan. Importantly, the robot never explicitly integrates object pose or surface estimates between visual and tactile sensing, instead it uses the two modalities in complementary ways. Vision guides the robots motion prior to contact; touch updates the plan when contact occurs differently than predicted from vision. We show that our method successfully synthesises and executes precision grasps for previously unseen objects using surface estimates from a single camera view. Further, our approach outperforms a state of the art multi-fingered grasp planner, while also beating several baselines we propose.

Via

Access Paper or Ask Questions

StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects

Nov 08, 2022

Weiyu Liu, Tucker Hermans, Sonia Chernova, Chris Paxton

Abstract:Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures out of a single RGB-D image based on high-level language goals, such as "set the table." Our method shows how diffusion models can be used for complex multi-step 3D planning tasks. StructDiffusion improves success rate on assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model, while allowing us to use one multi-task model to produce a wider range of different structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. For videos and additional results, check out our website: http://weiyuliu.com/StructDiffusion/.

Via

Access Paper or Ask Questions