Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Peters

Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Oct 29, 2020

Samuele Tosatto, João Carvalho, Jan Peters

Figure 1 for Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Figure 2 for Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Figure 3 for Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Figure 4 for Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Abstract:Off-policy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current off-policy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates. The price of inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited, and a very high sample cost hinders straightforward application. In this paper, we propose a nonparametric Bellman equation, which can be solved in closed form. The solution is differentiable w.r.t the policy parameters and gives access to an estimation of the policy gradient. In this way, we avoid the high variance of importance sampling approaches, and the high bias of semi-gradient methods. We empirically analyze the quality of our gradient estimate against state-of-the-art methods, and show that it outperforms the baselines in terms of sample efficiency on classical control tasks.

* arXiv admin note: substantial text overlap with arXiv:2001.02435

Via

Access Paper or Ask Questions

Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Oct 26, 2020

Samuele Tosatto, Georgia Chalvatzaki, Jan Peters

Figure 1 for Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Figure 2 for Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Figure 3 for Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Figure 4 for Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Abstract:Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for sample-efficient off-policy optimization of movement primitives for robot learning of high-dimensional manipulation skills. Our experimental results conducted both in simulation and on a real robot show that LAMPO provides sample-efficient policies against common approaches in literature.

Via

Access Paper or Ask Questions

ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows

Oct 25, 2020

Julen Urain, Michelle Ginesi, Davide Tateo, Jan Peters

Figure 1 for ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows

Figure 2 for ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows

Figure 3 for ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows

Figure 4 for ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows

Abstract:We introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, stochastic, nonlinear dynamics. Our approach extends the Normalizing Flows framework to learn stable Stochastic Differential Equations. We prove the Lyapunov stability for a class of Stochastic Differential Equations and we propose a learning algorithm to learn them from a set of demonstrated trajectories. Our model extends the set of stable dynamical systems that can be represented by state-of-the-art approaches, eliminates the Gaussian assumption on the demonstrations, and outperforms the previous algorithms in terms of representation accuracy. We show the effectiveness of our method with both standard datasets and a real robot experiment.

* 7pages, 7 figures, IROS 2020

Via

Access Paper or Ask Questions

A Differentiable Newton Euler Algorithm for Multi-body Model Learning

Oct 19, 2020

Michael Lutter, Johannes Silberbauer, Joe Watson, Jan Peters

Figure 1 for A Differentiable Newton Euler Algorithm for Multi-body Model Learning

Abstract:In this work, we examine a spectrum of hybrid model for the domain of multi-body robot dynamics. We motivate a computation graph architecture that embodies the Newton Euler equations, emphasizing the utility of the Lie Algebra form in translating the dynamical geometry into an efficient computational structure for learning. We describe the used virtual parameters that enable unconstrained physical plausible dynamics and the used actuator models. In the experiments, we define a family of 26 grey-box models and evaluate them for system identification of the simulated and physical Furuta Pendulum and Cartpole. The comparison shows that the kinematic parameters, required by previous white-box system identification methods, can be accurately inferred from data. Furthermore, we highlight that models with guaranteed bounded energy of the uncontrolled system generate non-divergent trajectories, while more general models have no such guarantee, so their performance strongly depends on the data distribution. Therefore, the main contributions of this work is the introduction of a white-box model that jointly learns dynamic and kinematics parameters and can be combined with black-box components. We then provide extensive empirical evaluation on challenging systems and different datasets that elucidates the comparative performance of our grey-box architecture with comparable white- and black-box models.

* ICML 2020 Workshop on Inductive Biases, Invariances and Generalization in Reinforcement Learning

Via

Access Paper or Ask Questions

Differentiable Implicit Layers

Oct 14, 2020

Andreas Look, Simona Doneva, Melih Kandemir, Rainer Gemulla, Jan Peters

Figure 1 for Differentiable Implicit Layers

Figure 2 for Differentiable Implicit Layers

Figure 3 for Differentiable Implicit Layers

Figure 4 for Differentiable Implicit Layers

Abstract:In this paper, we introduce an efficient backpropagation scheme for non-constrained implicit functions. These functions are parametrized by a set of learnable weights and may optionally depend on some input; making them perfectly suitable as learnable layer in a neural network. We demonstrate our scheme on different applications: (i) neural ODEs with the implicit Euler method, and (ii) system identification in model predictive control.

Via

Access Paper or Ask Questions

Active Inference or Control as Inference? A Unifying View

Oct 01, 2020

Joe Watson, Abraham Imohiosen, Jan Peters

Figure 1 for Active Inference or Control as Inference? A Unifying View

Abstract:Active inference (AI) is a persuasive theoretical framework from computational neuroscience that seeks to describe action and perception as inference-based computation. However, this framework has yet to provide practical sensorimotor control algorithms that are competitive with alternative approaches. In this work, we frame active inference through the lens of control as inference (CaI), a body of work that presents trajectory optimization as inference. From the wider view of `probabilistic numerics', CaI offers principled, numerically robust optimal control solvers that provide uncertainty quantification, and can scale to nonlinear problems with approximate inference. We show that AI may be framed as partially-observed CaI when the cost function is defined specifically in the observation states.

* International Workshop on Active Inference 2020 (IWAI)

Via

Access Paper or Ask Questions

Advances in Human-Robot Handshaking

Aug 26, 2020

Vignesh Prasad, Ruth Stock-Homburg, Jan Peters

Figure 1 for Advances in Human-Robot Handshaking

Abstract:The use of social, anthropomorphic robots to support humans in various industries has been on the rise. During Human-Robot Interaction (HRI), physically interactive non-verbal behaviour is key for more natural interactions. Handshaking is one such natural interaction used commonly in many social contexts. It is one of the first non-verbal interactions which takes place and should, therefore, be part of the repertoire of a social robot. In this paper, we explore the existing state of Human-Robot Handshaking and discuss possible ways forward for such physically interactive behaviours.

* Accepted at The 12th International Conference on Social Robotics (ICSR 2020) 12 Pages, 1 Figure

Via

Access Paper or Ask Questions

Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Aug 12, 2020

Marco Ewerton, Oleg Arenz, Jan Peters

Figure 1 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Figure 2 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Figure 3 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Figure 4 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Abstract:Haptic guidance is a powerful technique to combine the strengths of humans and autonomous systems for teleoperation. The autonomous system can provide haptic cues to enable the operator to perform precise movements; the operator can interfere with the plan of the autonomous system leveraging his/her superior cognitive capabilities. However, providing haptic cues such that the individual strengths are not impaired is challenging because low forces provide little guidance, whereas strong forces can hinder the operator in realizing his/her plan. Based on variational inference, we learn a Gaussian mixture model (GMM) over trajectories to accomplish a given task. The learned GMM is used to construct a potential field which determines the haptic cues. The potential field smoothly changes during teleoperation based on our updated belief over the plans and their respective phases. Furthermore, new plans are learned online when the operator does not follow any of the proposed plans, or after changes in the environment. User studies confirm that our framework helps users perform teleoperation tasks more accurately than without haptic cues and, in some cases, faster. Moreover, we demonstrate the use of our framework to help a subject teleoperate a 7 DoF manipulator in a pick-and-place task.

* Advanced Robotics, 2020
* 19 pages, 9 figures

Via

Access Paper or Ask Questions

Model-Based Quality-Diversity Search for Efficient Robot Learning

Aug 11, 2020

Leon Keller, Daniel Tanneberg, Svenja Stark, Jan Peters

Figure 1 for Model-Based Quality-Diversity Search for Efficient Robot Learning

Figure 2 for Model-Based Quality-Diversity Search for Efficient Robot Learning

Figure 3 for Model-Based Quality-Diversity Search for Efficient Robot Learning

Figure 4 for Model-Based Quality-Diversity Search for Efficient Robot Learning

Abstract:Despite recent progress in robot learning, it still remains a challenge to program a robot to deal with open-ended object manipulation tasks. One approach that was recently used to autonomously generate a repertoire of diverse skills is a novelty based Quality-Diversity~(QD) algorithm. However, as most evolutionary algorithms, QD suffers from sample-inefficiency and, thus, it is challenging to apply it in real-world scenarios. This paper tackles this problem by integrating a neural network that predicts the behavior of the perturbed parameters into a novelty based QD algorithm. In the proposed Model-based Quality-Diversity search (M-QD), the network is trained concurrently to the repertoire and is used to avoid executing unpromising actions in the novelty search process. Furthermore, it is used to adapt the skills of the final repertoire in order to generalize the skills to different scenarios. Our experiments show that enhancing a QD algorithm with such a forward model improves the sample-efficiency and performance of the evolutionary process and the skill adaptation.

* IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020

Via

Access Paper or Ask Questions

Multi-Sensor Next-Best-View Planning as Matroid-Constrained Submodular Maximization

Jul 04, 2020

Mikko Lauri, Joni Pajarinen, Jan Peters, Simone Frintrop

Figure 1 for Multi-Sensor Next-Best-View Planning as Matroid-Constrained Submodular Maximization

Figure 2 for Multi-Sensor Next-Best-View Planning as Matroid-Constrained Submodular Maximization

Figure 3 for Multi-Sensor Next-Best-View Planning as Matroid-Constrained Submodular Maximization

Figure 4 for Multi-Sensor Next-Best-View Planning as Matroid-Constrained Submodular Maximization

Abstract:3D scene models are useful in robotics for tasks such as path planning, object manipulation, and structural inspection. We consider the problem of creating a 3D model using depth images captured by a team of multiple robots. Each robot selects a viewpoint and captures a depth image from it, and the images are fused to update the scene model. The process is repeated until a scene model of desired quality is obtained. Next-best-view planning uses the current scene model to select the next viewpoints. The objective is to select viewpoints so that the images captured using them improve the quality of the scene model the most. In this paper, we address next-best-view planning for multiple depth cameras. We propose a utility function that scores sets of viewpoints and avoids overlap between multiple sensors. We show that multi-sensor next-best-view planning with this utility function is an instance of submodular maximization under a matroid constraint. This allows the planning problem to be solved by a polynomial-time greedy algorithm that yields a solution within a constant factor from the optimal. We evaluate the performance of our planning algorithm in simulated experiments with up to 8 sensors, and in real-world experiments using two robot arms equipped with depth cameras.

* 8 pages, 7 figures. Accepted for publication in IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions