We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces such as manipulation of deformable objects. Planning is performed in a low-dimensional latent state space that embeds images. We define and implement a Latent Space Roadmap (LSR) which is a graph-based structure that globally captures the latent system dynamics. Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them. We show the effectiveness of the method on a simulated box stacking task as well as a T-shirt folding task performed with a real robot.
In this work we propose algorithms to explicitly construct a conservative estimate of the configuration spaces of rigid objects in 2D and 3D. Our approach is able to detect compact path components and narrow passages in configuration space which are important for applications in robotic manipulation and path planning. Moreover, as we demonstrate, they are also applicable to identification of molecular cages in chemistry. Our algorithms are based on a decomposition of the resulting 3 and 6 dimensional configuration spaces into slices corresponding to a finite sample of fixed orientations in configuration space. We utilize dual diagrams of unions of balls and uniform grids of orientations to approximate the configuration space. We carry out experiments to evaluate the computational efficiency on a set of objects with different geometric features thus demonstrating that our approach is applicable to different object shapes. We investigate the performance of our algorithm by computing increasingly fine-grained approximations of the object's configuration space.
The purpose of this benchmark is to evaluate the planning and control aspects of robotic in-hand manipulation systems. The goal is to assess the system's ability to change the pose of a hand-held object by either using the fingers, environment or a combination of both. Given an object surface mesh from the YCB data-set, we provide examples of initial and goal states (i.e.\ static object poses and fingertip locations) for various in-hand manipulation tasks. We further propose metrics that measure the error in reaching the goal state from a specific initial state, which, when aggregated across all tasks, also serves as a measure of the system's in-hand manipulation capability. We provide supporting software, task examples, and evaluation results associated with the benchmark. All the supporting material is available at https://robot-learning.cs.utah.edu/project/benchmarking_in_hand_manipulation
In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated sorting tasks and observe its effectiveness in reliably sorting up to 40 convex objects. In addition, we observe that the algorithm is capable to also sort non-convex objects, as well as convex objects in the presence of immovable obstacles.
We propose to leverage a real-world, human activity RGB datasets to teach a robot {\em Task-Oriented Grasping} (TOG). On the one hand, RGB-D datasets that contain hands and objects in interaction often lack annotations due to the manual effort in obtaining them. On the other hand, RGB datasets are often annotated with labels that do not provide enough information to infer a 6D robotic grasp pose. However, they contain examples of grasps on a variety of objects for many different tasks. Thereby, they provide a much richer source of supervision than RGB-D datasets. We propose a model that takes as input an RGB image and outputs a hand pose and configuration as well as an object pose and a shape. We follow the insight that jointly estimating hand and object poses increases accuracy compared to estimating these quantities independently of each other. Quantitative experiments show that training an object pose predictor with the hand pose information (and vice versa) is better than training without this information. Given the trained model, we process an RGB dataset to automatically obtain training data for a TOG model. This model takes as input an object point cloud and a task and outputs a suitable region for grasping, given the task. Qualitative experiments show that our model can successfully process a real-world dataset. Experiments with a robot demonstrate that this data allows a robot to learn task-oriented grasping on novel objects.
To coordinate actions with an interaction partner requires a constant exchange of sensorimotor signals. Humans acquire these skills in infancy and early childhood mostly by imitation learning and active engagement with a skilled partner. They require the ability to predict and adapt to one's partner during an interaction. In this work we want to explore these ideas in a human-robot interaction setting in which a robot is required to learn interactive tasks from a combination of observational and kinesthetic learning. To this end, we propose a deep learning framework consisting of a number of components for (1) human and robot motion embedding, (2) motion prediction of the human partner and (3) generation of robot joint trajectories matching the human motion. To test these ideas, we collect human-human interaction data and human-robot interaction data of four interactive tasks "hand-shake", "hand-wave", "parachute fist-bump" and "rocket fist-bump". We demonstrate experimentally the importance of predictive and adaptive components as well as low-level abstractions to successfully learn to imitate human behavior in interactive social tasks.
Learning dynamics models is an essential component of model-based reinforcement learning. The learned model can be used for multi-step ahead predictions of the state variable, a process referred to as long-term prediction. Due to the recursive nature of the predictions, the accuracy has to be good enough to prevent significant error buildup. Accurate model learning in contact-rich manipulation is challenging due to the presence of varying dynamics regimes and discontinuities. Another challenge is the discontinuity in state evolution caused by impacting conditions. Building on the approach of representing contact dynamics by a system of switching models, we present a solution that also supports discontinuous state evolution. We evaluate our method on a contact-rich motion task, involving a 7-DOF industrial robot, using a trajectory-centric policy and show that it can effectively propagate state distributions through discontinuities.
In socially assistive robotics, an important research area is the development of adaptation techniques and their effect on human-robot interaction. We present a meta-learning based policy gradient method for addressing the problem of adaptation in human-robot interaction and also investigate its role as a mechanism for trust modelling. By building an escape room scenario in mixed reality with a robot, we test our hypothesis that bi-directional trust can be influenced by different adaptation algorithms. We found that our proposed model increased the perceived trustworthiness of the robot and influenced the dynamics of gaining human's trust. Additionally, participants evaluated that the robot perceived them as more trustworthy during the interactions with the meta-learning based adaptation compared to the previously studied statistical adaptation model.
Data-efficiency is crucial for autonomous robots to adapt to new tasks and environments. In this work we focus on robotics problems with a budget of only 10-20 trials. This is a very challenging setting even for data-efficient approaches like Bayesian optimization (BO), especially when optimizing higher-dimensional controllers. Simulated trajectories can be used to construct informed kernels for BO. However, previous work employed supervised ways of extracting low-dimensional features for these. We propose a model and architecture for a sequential variational autoencoder that embeds the space of simulated trajectories into a lower-dimensional space of latent paths in an unsupervised way. We further compress the search space for BO by reducing exploration in parts of the state space that are undesirable, without requiring explicit constraints on controller parameters. We validate our approach with hardware experiments on a Daisy hexapod robot and an ABB Yumi manipulator. We also present simulation experiments with further comparisons to several baselines on Daisy and two manipulators. Our experiments indicate the proposed trajectory-based kernel with dynamic compression can offer ultra data-efficient optimization.
We address the problem of motion planning for a robotic manipulator with the task to place a grasped object in a cluttered environment. In this task, we need to locate a collision-free pose for the object that a) facilitates the stable placement of the object, b) is reachable by the robot manipulator and c) optimizes a user-given placement objective. Because of the placement objective, this problem is more challenging than classical motion planning where the target pose is defined from the start. To solve this task, we propose an anytime algorithm that integrates sampling-based motion planning for the robot manipulator with a novel hierarchical search for suitable placement poses. We evaluate our approach on a dual-arm robot for two different placement objectives, and observe its effectiveness even in challenging scenarios.