Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Peters

Robot Learning from Randomized Simulations: A Review

Nov 01, 2021

Fabio Muratore, Fabio Ramos, Greg Turk, Wenhao Yu, Michael Gienger, Jan Peters

Figure 1 for Robot Learning from Randomized Simulations: A Review

Figure 2 for Robot Learning from Randomized Simulations: A Review

Figure 3 for Robot Learning from Randomized Simulations: A Review

Figure 4 for Robot Learning from Randomized Simulations: A Review

Abstract:The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. It is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the 'reality gap'. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named 'domain randomization' which is a method for learning from randomized simulations.

* submitted to Frontiers in Robotics and AI

Via

Access Paper or Ask Questions

A Differentiable Newton-Euler Algorithm for Real-World Robotics

Oct 24, 2021

Michael Lutter, Johannes Silberbauer, Joe Watson, Jan Peters

Figure 1 for A Differentiable Newton-Euler Algorithm for Real-World Robotics

Figure 2 for A Differentiable Newton-Euler Algorithm for Real-World Robotics

Figure 3 for A Differentiable Newton-Euler Algorithm for Real-World Robotics

Figure 4 for A Differentiable Newton-Euler Algorithm for Real-World Robotics

Abstract:Obtaining dynamics models is essential for robotics to achieve accurate model-based controllers and simulators for planning. The dynamics models are typically obtained using model specification of the manufacturer or simple numerical methods such as linear regression. However, this approach does not guarantee physically plausible parameters and can only be applied to kinematic chains consisting of rigid bodies. In this article, we describe a differentiable simulator that can be used to identify the system parameters of real-world mechanical systems with complex friction models, holonomic as well as non-holonomic constraints. To guarantee physically consistent parameters, we utilize virtual parameters and gradient-based optimization. The described Differentiable Newton-Euler Algorithm (DiffNEA) can be applied to a class of dynamical systems and guarantees physically plausible predictions. The extensive experimental evaluation shows, that the proposed model learning approach learns accurate dynamics models of systems with complex friction and non-holonomic constraints. Especially in the offline reinforcement learning experiments, the identified DiffNEA models excel. For the challenging ball in a cup task, these models solve the task using model-based offline reinforcement learning on the physical system. The black-box baselines fail on this task in simulation and on the physical system despite using more data for learning the model.

* arXiv admin note: text overlap with arXiv:2011.01734

Via

Access Paper or Ask Questions

Learning Stable Vector Fields on Lie Groups

Oct 22, 2021

Julen Urain, Davide Tateo, Jan Peters

Figure 1 for Learning Stable Vector Fields on Lie Groups

Figure 2 for Learning Stable Vector Fields on Lie Groups

Figure 3 for Learning Stable Vector Fields on Lie Groups

Figure 4 for Learning Stable Vector Fields on Lie Groups

Abstract:Learning robot motions from demonstration requires having models that are able to represent vector fields for the full robot pose when the task is defined in operational space. Recent advances in reactive motion generation have shown that it is possible to learn adaptive, reactive, smooth, and stable vector fields. However, these approaches define a vector field on a flat Euclidean manifold, while representing vector fields for orientations required to model the dynamics in non-Euclidean manifolds, such as Lie Groups. In this paper, we present a novel vector field model that can guarantee most of the properties of previous approaches i.e., stability, smoothness, and reactivity beyond the Euclidean space. In the experimental evaluation, we show the performance of our proposed vector field model to learn stable vector fields for full robot poses as SE(2) and SE(3) in both simulated and real robotics tasks.

* ICRA RA-L preprint

Via

Access Paper or Ask Questions

Continuous-Time Fitted Value Iteration for Robust Policies

Oct 05, 2021

Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters

Figure 1 for Continuous-Time Fitted Value Iteration for Robust Policies

Figure 2 for Continuous-Time Fitted Value Iteration for Robust Policies

Figure 3 for Continuous-Time Fitted Value Iteration for Robust Policies

Figure 4 for Continuous-Time Fitted Value Iteration for Robust Policies

Abstract:Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

* arXiv admin note: text overlap with arXiv:2105.12189

Via

Access Paper or Ask Questions

Combining Physics and Deep Learning to learn Continuous-Time Dynamics Models

Oct 05, 2021

Michael Lutter, Jan Peters

Figure 1 for Combining Physics and Deep Learning to learn Continuous-Time Dynamics Models

Figure 2 for Combining Physics and Deep Learning to learn Continuous-Time Dynamics Models

Figure 3 for Combining Physics and Deep Learning to learn Continuous-Time Dynamics Models

Figure 4 for Combining Physics and Deep Learning to learn Continuous-Time Dynamics Models

Abstract:Deep learning has been widely used within learning algorithms for robotics. One disadvantage of deep networks is that these networks are black-box representations. Therefore, the learned approximations ignore the existing knowledge of physics or robotics. Especially for learning dynamics models, these black-box models are not desirable as the underlying principles are well understood and the standard deep networks can learn dynamics that violate these principles. To learn dynamics models with deep networks that guarantee physically plausible dynamics, we introduce physics-inspired deep networks that combine first principles from physics with deep learning. We incorporate Lagrangian mechanics within the model learning such that all approximated models adhere to the laws of physics and conserve energy. Deep Lagrangian Networks (DeLaN) parametrize the system energy using two networks. The parameters are obtained by minimizing the squared residual of the Euler-Lagrange differential equation. Therefore, the resulting model does not require specific knowledge of the individual system, is interpretable, and can be used as a forward, inverse, and energy model. Previously these properties were only obtained when using system identification techniques that require knowledge of the kinematic structure. We apply DeLaN to learning dynamics models and apply these models to control simulated and physical rigid body systems. The results show that the proposed approach obtains dynamics models that can be applied to physical systems for real-time control. Compared to standard deep networks, the physics-inspired models learn better models and capture the underlying structure of the dynamics.

Via

Access Paper or Ask Questions

A Robot Cluster for Reproducible Research in Dexterous Manipulation

Sep 22, 2021

Stefan Bauer, Felix Widmaier, Manuel Wüthrich, Niklas Funk, Julen Urain De Jesus, Jan Peters, Joe Watson, Claire Chen, Krishnan Srinivasan, Junwu Zhang(+19 more)

Figure 1 for A Robot Cluster for Reproducible Research in Dexterous Manipulation

Figure 2 for A Robot Cluster for Reproducible Research in Dexterous Manipulation

Figure 3 for A Robot Cluster for Reproducible Research in Dexterous Manipulation

Figure 4 for A Robot Cluster for Reproducible Research in Dexterous Manipulation

Abstract:Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at the MPI-IS and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks, ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.

Via

Access Paper or Ask Questions

An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Jul 20, 2021

João Carvalho, Davide Tateo, Fabio Muratore, Jan Peters

Figure 1 for An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Figure 2 for An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Figure 3 for An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Figure 4 for An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Abstract:Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.

Via

Access Paper or Ask Questions

Efficient and Reactive Planning for High Speed Robot Air Hockey

Jul 14, 2021

Puze Liu, Davide Tateo, Haitham Bou-Ammar, Jan Peters

Figure 1 for Efficient and Reactive Planning for High Speed Robot Air Hockey

Figure 2 for Efficient and Reactive Planning for High Speed Robot Air Hockey

Figure 3 for Efficient and Reactive Planning for High Speed Robot Air Hockey

Figure 4 for Efficient and Reactive Planning for High Speed Robot Air Hockey

Abstract:Highly dynamic robotic tasks require high-speed and reactive robots. These tasks are particularly challenging due to the physical constraints, hardware limitations, and the high uncertainty of dynamics and sensor measures. To face these issues, it's crucial to design robotics agents that generate precise and fast trajectories and react immediately to environmental changes. Air hockey is an example of this kind of task. Due to the environment's characteristics, it is possible to formalize the problem and derive clean mathematical solutions. For these reasons, this environment is perfect for pushing to the limit the performance of currently available general-purpose robotic manipulators. Using two Kuka Iiwa 14, we show how to design a policy for general-purpose robotic manipulators for the air hockey game. We demonstrate that a real robot arm can perform fast-hitting movements and that the two robots can play against each other on a medium-size air hockey table in simulation.

* 2021 IEEE/RJS International Conference on Intelligent RObots and Systems (IROS)

Via

Access Paper or Ask Questions

High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

Jun 16, 2021

Antoine Grosnit, Rasul Tutunov, Alexandre Max Maraval, Ryan-Rhys Griffiths, Alexander I. Cowen-Rivers, Lin Yang, Lin Zhu, Wenlong Lyu, Zhitang Chen, Jun Wang(+2 more)

Figure 1 for High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

Figure 2 for High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

Figure 3 for High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

Figure 4 for High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

Abstract:We introduce a method based on deep metric learning to perform Bayesian optimisation over high-dimensional, structured input spaces using variational autoencoders (VAEs). By extending ideas from supervised deep metric learning, we address a longstanding problem in high-dimensional VAE Bayesian optimisation, namely how to enforce a discriminative latent space as an inductive bias. Importantly, we achieve such an inductive bias using just 1% of the available labelled data relative to previous work, highlighting the sample efficiency of our approach. As a theoretical contribution, we present a proof of vanishing regret for our method. As an empirical contribution, we present state-of-the-art results on real-world high-dimensional black-box optimisation problems including property-guided molecule generation. It is the hope that the results presented in this paper can act as a guiding principle for realising effective high-dimensional Bayesian optimisation.

Via

Access Paper or Ask Questions

Robust Value Iteration for Continuous Control Tasks

May 25, 2021

Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

Figure 1 for Robust Value Iteration for Continuous Control Tasks

Figure 2 for Robust Value Iteration for Continuous Control Tasks

Figure 3 for Robust Value Iteration for Continuous Control Tasks

Figure 4 for Robust Value Iteration for Continuous Control Tasks

Abstract:When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding state-distribution, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuous-time perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

* Accepted Paper at Robotics: Science and Systems

Via

Access Paper or Ask Questions