To solve multi-step manipulation tasks in the real world, an autonomous robot must take actions to observe its environment and react to unexpected observations. This may require opening a drawer to observe its contents or moving an object out of the way to examine the space behind it. If the robot fails to detect an important object, it must update its belief about the world and compute a new plan of action. Additionally, a robot that acts noisily will never exactly arrive at a desired state. Still, it is important that the robot adjusts accordingly in order to keep making progress towards achieving the goal. In this work, we present an online planning and execution system for robots faced with these kinds of challenges. Our approach is able to efficiently solve partially observable problems both in simulation and in a real-world kitchen.
In this work, we bridge the gap between recent pose estimation and tracking work to develop a powerful method for robots to track objects in their surroundings. Motion-Nets use a segmentation model to segment the scene, and separate translation and rotation models to identify the relative 6D motion of an object between two consecutive frames. We train our method with generated data of floating objects, and then test on several prediction tasks, including one with a real PR2 robot, and a toy control task with a simulated PR2 robot never seen during training. Motion-Nets are able to track the pose of objects with some quantitative accuracy for about 30-60 frames including occlusions and distractors. Additionally, the single step prediction errors remain low even after 100 frames. We also investigate an iterative correction procedure to improve performance for control tasks.
Widespread adoption of self-driving cars will depend not only on their safety but largely on their ability to interact with human users. Just like human drivers, self-driving cars will be expected to understand and safely follow natural-language directions that suddenly alter the pre-planned route according to user's preference or in presence of ambiguities, particularly in locations with poor or outdated map coverage. To this end, we propose a language-grounded driving agent implementing a hierarchical policy using recurrent layers and gated attention. The hierarchical approach enables us to reason both in terms of high-level language instructions describing long time horizons and low-level, complex, continuous state/action spaces required for real-time control of a self-driving car. We train our policy with conditional imitation learning from realistic language data collected from human drivers and navigators. Through quantitative and interactive experiments within the CARLA framework, we show that our model can successfully interpret language instructions and follow them safely, even when generalizing to previously unseen environments. Code and video are available at https://sites.google.com/view/language-grounded-driving.
Teleoperation offers the possibility of imparting robotic systems with sophisticated reasoning skills, intuition, and creativity to perform tasks. However, current teleoperation solutions for high degree-of-actuation (DoA), multi-fingered robots are generally cost-prohibitive, while low-cost offerings usually provide reduced degrees of control. Herein, a low-cost, vision based teleoperation system, DexPilot, was developed that allows for complete control over the full 23 DoA robotic system by merely observing the bare human hand. DexPilot enables operators to carry out a variety of complex manipulation tasks that go beyond simple pick-and-place operations. This allows for collection of high dimensional, multi-modality, state-action data that can be leveraged in the future to learn sensorimotor policies for challenging manipulation tasks. The system performance was measured through speed and reliability metrics across two human demonstrators on a variety of tasks. The videos of the experiments can be found at https://sites.google.com/view/dex-pilot.
Effective human-robot collaboration requires informed anticipation. The robot must simultaneously anticipate what the human will do and react both instantaneously and fluidly when its predictions are wrong. Even more, the robot must plan its own actions in a way that accounts for the human predictions but also with the knowledge that the human's own behavior will change based on what the robot does. This back-and-forth game of prediction and planning is extremely difficult to model well using standard techniques. In this work, we exploit the duality between behavior prediction and control explored in the Inverse Optimal Control (IOC) literature to design a novel Model Predictive Control (MPC) algorithm that simultaneously plans the robot's behavior and predicts the human's behavior in a joint optimal control model. In the process, we develop a novel technique for bridging finite-horizon motion optimizers to the problem of spatially consistent continuous optimization using explicit sparse reward terms, i.e., negative cost. We demonstrate the framework on a collection of cooperative human-robot handover experiments in both simulation and with a real-world handover scenario.
RMPflow is a recently proposed policy-fusion framework based on differential geometry. While RMPflow has demonstrated promising performance, it requires the user to provide sensible subtask policies as Riemannian motion policies (RMPs: a motion policy and an importance matrix function), which can be a difficult design problem in its own right. We propose RMPfusion, a variation of RMPflow, to address this issue. RMPfusion supplements RMPflow with weight functions that can hierarchically reshape the Lyapunov functions of the subtask RMPs according to the current configuration of the robot and environment. This extra flexibility can remedy imperfect subtask RMPs provided by the user, improving the combined policy's performance. These weight functions can be learned by back-propagation. Moreover, we prove that, under mild restrictions on the weight functions, RMPfusion always yields a globally Lyapunov-stable motion policy. This implies that we can treat RMPfusion as a structured policy class in policy optimization that is guaranteed to generate stable policies, even during the immature phase of learning. We demonstrate these properties of RMPfusion in imitation learning experiments both in simulation and on a real-world robot.
Visual topological navigation has been revitalized recently thanks to the advancement of deep learning that substantially improves robot perception. However, the scalability and reliability issue remain challenging due to the complexity and ambiguity of real world images and mechanical constraints of real robots. We present an intuitive solution to show that by accurately measuring the capability of a local controller, large-scale visual topological navigation can be achieved while being scalable and robust. Our approach achieves state-of-the-art results in trajectory following and planning in large-scale environments. It also generalizes well to real robots and new environments without retraining or finetuning.
To teach robots skills, it is crucial to obtain data with supervision. Since annotating real world data is time-consuming and expensive, enabling robots to learn in a self-supervised way is important. In this work, we introduce a robot system for self-supervised 6D object pose estimation. Starting from modules trained in simulation, our system is able to label real world images with accurate 6D object poses for self-supervised learning. In addition, the robot interacts with objects in the environment to change the object configuration by grasping or pushing objects. In this way, our system is able to continuously collect data and improve its pose estimation modules. We show that the self-supervised learning improves object segmentation and 6D pose estimation performance, and consequently enables the system to grasp objects more reliably. A video showing the experiments can be found at https://youtu.be/W1Y0Mmh1Gd8.