Alert button
Picture for Jan Peters

Jan Peters

Alert button

Controlling the Cascade: Kinematic Planning for N-ball Toss Juggling

Jul 04, 2022
Kai Ploeger, Jan Peters

Figure 1 for Controlling the Cascade: Kinematic Planning for N-ball Toss Juggling
Figure 2 for Controlling the Cascade: Kinematic Planning for N-ball Toss Juggling
Figure 3 for Controlling the Cascade: Kinematic Planning for N-ball Toss Juggling
Figure 4 for Controlling the Cascade: Kinematic Planning for N-ball Toss Juggling

Dynamic movements are ubiquitous in human motor behavior as they tend to be more efficient and can solve a broader range of skill domains than their quasi-static counterparts. For decades, robotic juggling tasks have been among the most frequently studied dynamic manipulation problems since the required dynamic dexterity can be scaled to arbitrarily high difficulty. However, successful approaches have been limited to basic juggling skills, indicating a lack of understanding of the required constraints for dexterous toss juggling. We present a detailed analysis of the toss juggling task, identifying the key challenges and formalizing it as a trajectory optimization problem. Building on our state-of-the-art, real-world toss juggling platform, we reach the theoretical limits of toss juggling in simulation, evaluate a resulting real-time controller in environments of varying difficulty and achieve robust toss juggling of up to 17 balls on two anthropomorphic manipulators.

Viaarxiv icon

Learning Implicit Priors for Motion Optimization

Apr 11, 2022
Alexander Lambert, An T. Le, Julen Urain, Georgia Chalvatzaki, Byron Boots, Jan Peters

Figure 1 for Learning Implicit Priors for Motion Optimization
Figure 2 for Learning Implicit Priors for Motion Optimization
Figure 3 for Learning Implicit Priors for Motion Optimization
Figure 4 for Learning Implicit Priors for Motion Optimization

In this paper, we focus on the problem of integrating Energy-based Models (EBM) as guiding priors for motion optimization. EBMs are a set of neural networks that can represent expressive probability density distributions in terms of a Gibbs distribution parameterized by a suitable energy function. Due to their implicit nature, they can easily be integrated as optimization factors or as initial sampling distributions in the motion optimization problem, making them good candidates to integrate data-driven priors in the motion optimization problem. In this work, we present a set of required modeling and algorithmic choices to adapt EBMs into motion optimization. We investigate the benefit of including additional regularizers in the learning of the EBMs to use them with gradient-based optimizers and we present a set of EBM architectures to learn generalizable distributions for manipulation tasks. We present multiple cases in which the EBM could be integrated for motion optimization and evaluate the performance of learned EBMs as guiding priors for both simulated and real robot experiments.

* 15 pages, paper website: https://sites.google.com/view/implicit-priors/home 
Viaarxiv icon

Revisiting Model-based Value Expansion

Mar 28, 2022
Daniel Palenicek, Michael Lutter, Jan Peters

Figure 1 for Revisiting Model-based Value Expansion
Figure 2 for Revisiting Model-based Value Expansion

Model-based value expansion methods promise to improve the quality of value function targets and, thereby, the effectiveness of value function learning. However, to date, these methods are being outperformed by Dyna-style algorithms with conceptually simpler 1-step value function targets. This shows that in practice, the theoretical justification of value expansion does not seem to hold. We provide a thorough empirical study to shed light on the causes of failure of value expansion methods in practice which is believed to be the compounding model error. By leveraging GPU based physics simulators, we are able to efficiently use the true dynamics for analysis inside the model-based reinforcement learning loop. Performing extensive comparisons between true and learned dynamics sheds light into this black box. This paper provides a better understanding of the actual problems in value expansion. We provide future directions of research by empirically testing the maximum theoretical performance of current approaches.

Viaarxiv icon

Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking

Mar 20, 2022
Lei Xu, Tianyu Ren, Georgia Chalvatzaki, Jan Peters

Figure 1 for Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking
Figure 2 for Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking
Figure 3 for Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking
Figure 4 for Accelerating Integrated Task and Motion Planning with Neural Feasibility Checking

As robots play an increasingly important role in the industrial, the expectations about their applications for everyday living tasks are getting higher. Robots need to perform long-horizon tasks that consist of several sub-tasks that need to be accomplished. Task and Motion Planning (TAMP) provides a hierarchical framework to handle the sequential nature of manipulation tasks by interleaving a symbolic task planner that generates a possible action sequence, with a motion planner that checks the kinematic feasibility in the geometric world, generating robot trajectories if several constraints are satisfied, e.g., a collision-free trajectory from one state to another. Hence, the reasoning about the task plan's geometric grounding is taken over by the motion planner. However, motion planning is computationally intense and is usability as feasibility checker casts TAMP methods inapplicable to real-world scenarios. In this paper, we introduce neural feasibility classifier (NFC), a simple yet effective visual heuristic for classifying the feasibility of proposed actions in TAMP. Namely, NFC will identify infeasible actions of the task planner without the need for costly motion planning, hence reducing planning time in multi-step manipulation tasks. NFC encodes the image of the robot's workspace into a feature map thanks to convolutional neural network (CNN). We train NFC using simulated data from TAMP problems and label the instances based on IK feasibility checking. Our empirical results in different simulated manipulation tasks show that our NFC generalizes to the entire robot workspace and has high prediction accuracy even in scenes with multiple obstructions. When combined with state-of-the-art integrated TAMP, our NFC enhances its performance while reducing its planning time.

* 6 pages, 6 figures, 
Viaarxiv icon

Dimensionality Reduction and Prioritized Exploration for Policy Search

Mar 19, 2022
Marius Memmel, Puze Liu, Davide Tateo, Jan Peters

Figure 1 for Dimensionality Reduction and Prioritized Exploration for Policy Search
Figure 2 for Dimensionality Reduction and Prioritized Exploration for Policy Search
Figure 3 for Dimensionality Reduction and Prioritized Exploration for Policy Search
Figure 4 for Dimensionality Reduction and Prioritized Exploration for Policy Search

Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level. This class of algorithms is widely applied in robotics with movement primitives or non-differentiable policies. Furthermore, these approaches are particularly relevant where exploration at the action level could cause actuator damage or other safety issues. However, Black-box optimization does not scale well with the increasing dimensionality of the policy, leading to high demand for samples, which are expensive to obtain in real-world systems. In many practical applications, policy parameters do not contribute equally to the return. Identifying the most relevant parameters allows to narrow down the exploration and speed up the learning. Furthermore, updating only the effective parameters requires fewer samples, improving the scalability of the method. We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates. Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results. To select the effective parameters, we consider both the Pearson correlation coefficient and the Mutual Information. We showcase the capabilities of our approach on the Relative Entropy Policy Search algorithm in several simulated environments, including robotics simulations. Code is available at https://git.ias.informatik.tu-darmstadt.de/ias\_code/aistats2022/dr-creps}{git.ias.informatik.tu-darmstadt.de/ias\_code/aistats2022/dr-creps.

* The 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 
Viaarxiv icon

Regularized Deep Signed Distance Fields for Reactive Motion Generation

Mar 09, 2022
Puze Liu, Kuo Zhang, Davide Tateo, Snehal Jauhri, Jan Peters, Georgia Chalvatzaki

Figure 1 for Regularized Deep Signed Distance Fields for Reactive Motion Generation
Figure 2 for Regularized Deep Signed Distance Fields for Reactive Motion Generation
Figure 3 for Regularized Deep Signed Distance Fields for Reactive Motion Generation
Figure 4 for Regularized Deep Signed Distance Fields for Reactive Motion Generation

Autonomous robots should operate in real-world dynamic environments and collaborate with humans in tight spaces. A key component for allowing robots to leave structured lab and manufacturing settings is their ability to evaluate online and real-time collisions with the world around them. Distance-based constraints are fundamental for enabling robots to plan their actions and act safely, protecting both humans and their hardware. However, different applications require different distance resolutions, leading to various heuristic approaches for measuring distance fields w.r.t. obstacles, which are computationally expensive and hinder their application in dynamic obstacle avoidance use-cases. We propose Regularized Deep Signed Distance Fields (ReDSDF), a single neural implicit function that can compute smooth distance fields at any scale, with fine-grained resolution over high-dimensional manifolds and articulated bodies like humans, thanks to our effective data generation and a simple inductive bias during training. We demonstrate the effectiveness of our approach in representative simulated tasks for whole-body control (WBC) and safe Human-Robot Interaction (HRI) in shared workspaces. Finally, we provide proof of concept of a real-world application in a HRI handover task with a mobile manipulator robot.

Viaarxiv icon

Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery

Mar 08, 2022
Niklas Funk, Svenja Menzenbach, Georgia Chalvatzaki, Jan Peters

Figure 1 for Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery
Figure 2 for Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery
Figure 3 for Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery
Figure 4 for Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery

Robot assembly discovery is a challenging problem that lives at the intersection of resource allocation and motion planning. The goal is to combine a predefined set of objects to form something new while considering task execution with the robot-in-the-loop. In this work, we tackle the problem of building arbitrary, predefined target structures entirely from scratch using a set of Tetris-like building blocks and a robotic manipulator. Our novel hierarchical approach aims at efficiently decomposing the overall task into three feasible levels that benefit mutually from each other. On the high level, we run a classical mixed-integer program for global optimization of block-type selection and the blocks' final poses to recreate the desired shape. Its output is then exploited to efficiently guide the exploration of an underlying reinforcement learning (RL) policy. This RL policy draws its generalization properties from a flexible graph-based representation that is learned through Q-learning and can be refined with search. Moreover, it accounts for the necessary conditions of structural stability and robotic feasibility that cannot be effectively reflected in the previous layer. Lastly, a grasp and motion planner transforms the desired assembly commands into robot joint movements. We demonstrate the performance of the proposed method on a set of competitive simulated robot assembly discovery environments and report performance and robustness gains compared to an unstructured end-to-end approach. Videos are available at https://sites.google.com/view/rl-meets-milp .

Viaarxiv icon

Robot Learning of Mobile Manipulation with Reachability Behavior Priors

Mar 08, 2022
Snehal Jauhri, Jan Peters, Georgia Chalvatzaki

Figure 1 for Robot Learning of Mobile Manipulation with Reachability Behavior Priors
Figure 2 for Robot Learning of Mobile Manipulation with Reachability Behavior Priors
Figure 3 for Robot Learning of Mobile Manipulation with Reachability Behavior Priors
Figure 4 for Robot Learning of Mobile Manipulation with Reachability Behavior Priors

Mobile Manipulation (MM) systems are ideal candidates for taking up the role of a personal assistant in unstructured real-world environments. Among other challenges, MM requires effective coordination of the robot's embodiments for executing tasks that require both mobility and manipulation. Reinforcement Learning (RL) holds the promise of endowing robots with adaptive behaviors, but most methods require prohibitively large amounts of data for learning a useful control policy. In this work, we study the integration of robotic reachability priors in actor-critic RL methods for accelerating the learning of MM for reaching and fetching tasks. Namely, we consider the problem of optimal base placement and the subsequent decision of whether to activate the arm for reaching a 6D target. For this, we devise a novel Hybrid RL method that handles discrete and continuous actions jointly, resorting to the Gumbel-Softmax reparameterization. Next, we train a reachability prior using data from the operational robot workspace, inspired by classical methods. Subsequently, we derive Boosted Hybrid RL (BHyRL), a novel algorithm for learning Q-functions by modeling them as a sum of residual approximators. Every time a new task needs to be learned, we can transfer our learned residuals and learn the component of the Q-function that is task-specific, hence, maintaining the task structure from prior behaviors. Moreover, we find that regularizing the target policy with a prior policy yields more expressive behaviors. We evaluate our method in simulation in reaching and fetching tasks of increasing difficulty, and we show the superior performance of BHyRL against baseline methods. Finally, we zero-transfer our learned 6D fetching policy with BHyRL to our MM robot TIAGo++. For more details and code release, please refer to our project site: irosalab.com/rlmmbp

* Submitted to RAL-IROS 2022 
Viaarxiv icon

A Hierarchical Approach to Active Pose Estimation

Mar 08, 2022
Jascha Hellwig, Mark Baierl, Joao Carvalho, Julen Urain, Jan Peters

Figure 1 for A Hierarchical Approach to Active Pose Estimation
Figure 2 for A Hierarchical Approach to Active Pose Estimation
Figure 3 for A Hierarchical Approach to Active Pose Estimation
Figure 4 for A Hierarchical Approach to Active Pose Estimation

Creating mobile robots which are able to find and manipulate objects in large environments is an active topic of research. These robots not only need to be capable of searching for specific objects but also to estimate their poses often relying on environment observations, which is even more difficult in the presence of occlusions. Therefore, to tackle this problem we propose a simple hierarchical approach to estimate the pose of a desired object. An Active Visual Search module operating with RGB images first obtains a rough estimation of the object 2D pose, followed by a more computationally expensive Active Pose Estimation module using point cloud data. We empirically show that processing image features to obtain a richer observation speeds up the search and pose estimation computations, in comparison to a binary decision that indicates whether the object is or not in the current image.

Viaarxiv icon