Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oleg Arenz

Context-Aware Deep Lagrangian Networks for Model Predictive Control

Jun 18, 2025

Lucas Schulze, Jan Peters, Oleg Arenz

Abstract:Controlling a robot based on physics-informed dynamic models, such as deep Lagrangian networks (DeLaN), can improve the generalizability and interpretability of the resulting behavior. However, in complex environments, the number of objects to potentially interact with is vast, and their physical properties are often uncertain. This complexity makes it infeasible to employ a single global model. Therefore, we need to resort to online system identification of context-aware models that capture only the currently relevant aspects of the environment. While physical principles such as the conservation of energy may not hold across varying contexts, ensuring physical plausibility for any individual context-aware model can still be highly desirable, particularly when using it for receding horizon control methods such as Model Predictive Control (MPC). Hence, in this work, we extend DeLaN to make it context-aware, combine it with a recurrent network for online system identification, and integrate it with a MPC for adaptive, physics-informed control. We also combine DeLaN with a residual dynamics model to leverage the fact that a nominal model of the robot is typically available. We evaluate our method on a 7-DOF robot arm for trajectory tracking under varying loads. Our method reduces the end-effector tracking error by 39%, compared to a 21% improvement achieved by a baseline that uses an extended Kalman filter.

* Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

Via

Access Paper or Ask Questions

Maximum Total Correlation Reinforcement Learning

May 22, 2025

Bang You, Puze Liu, Huaping Liu, Jan Peters, Oleg Arenz

Abstract:Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.

* ICML 2025

Via

Access Paper or Ask Questions

Learning from Less: Guiding Deep Reinforcement Learning with Differentiable Symbolic Planning

May 16, 2025

Zihan Ye, Oleg Arenz, Kristian Kersting

Abstract:When tackling complex problems, humans naturally break them down into smaller, manageable subtasks and adjust their initial plans based on observations. For instance, if you want to make coffee at a friend's place, you might initially plan to grab coffee beans, go to the coffee machine, and pour them into the machine. Upon noticing that the machine is full, you would skip the initial steps and proceed directly to brewing. In stark contrast, state of the art reinforcement learners, such as Proximal Policy Optimization (PPO), lack such prior knowledge and therefore require significantly more training steps to exhibit comparable adaptive behavior. Thus, a central research question arises: \textit{How can we enable reinforcement learning (RL) agents to have similar ``human priors'', allowing the agent to learn with fewer training interactions?} To address this challenge, we propose differentiable symbolic planner (Dylan), a novel framework that integrates symbolic planning into Reinforcement Learning. Dylan serves as a reward model that dynamically shapes rewards by leveraging human priors, guiding agents through intermediate subtasks, thus enabling more efficient exploration. Beyond reward shaping, Dylan can work as a high level planner that composes primitive policies to generate new behaviors while avoiding common symbolic planner pitfalls such as infinite execution loops. Our experimental evaluations demonstrate that Dylan significantly improves RL agents' performance and facilitates generalization to unseen tasks.

* conference paper, 9 pages

Via

Access Paper or Ask Questions

Machine Learning with Physics Knowledge for Prediction: A Survey

Aug 19, 2024

Joe Watson, Chen Song, Oliver Weeger, Theo Gruner, An T. Le, Kay Hansel, Ahmed Hendawy, Oleg Arenz, Will Trojak, Miles Cranmer(+5 more)

Figure 1 for Machine Learning with Physics Knowledge for Prediction: A Survey

Figure 2 for Machine Learning with Physics Knowledge for Prediction: A Survey

Figure 3 for Machine Learning with Physics Knowledge for Prediction: A Survey

Figure 4 for Machine Learning with Physics Knowledge for Prediction: A Survey

Abstract:This survey examines the broad suite of methods and models for combining machine learning with physics knowledge for prediction and forecast, with a focus on partial differential equations. These methods have attracted significant interest due to their potential impact on advancing scientific research and industrial practices by improving predictive models with small- or large-scale datasets and expressive predictive models with useful inductive biases. The survey has two parts. The first considers incorporating physics knowledge on an architectural level through objective functions, structured predictive models, and data augmentation. The second considers data as physics knowledge, which motivates looking at multi-task, meta, and contextual learning as an alternative approach to incorporating physics knowledge in a data-driven fashion. Finally, we also provide an industrial perspective on the application of these methods and a survey of the open-source ecosystem for physics-informed machine learning.

* 56 pages, 8 figures, 2 tables

Via

Access Paper or Ask Questions

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Mar 01, 2023

Firas Al-Hafez, Davide Tateo, Oleg Arenz, Guoping Zhao, Jan Peters

Figure 1 for LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Figure 2 for LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Figure 3 for LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Figure 4 for LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Abstract:Recent methods for imitation learning directly learn a $Q$-function using an implicit reward formulation rather than an explicit reward function. However, these methods generally require implicit reward regularization to improve stability and often mistreat absorbing states. Previous works show that a squared norm regularization on the implicit reward function is effective, but do not provide a theoretical analysis of the resulting properties of the algorithms. In this work, we show that using this regularizer under a mixture distribution of the policy and the expert provides a particularly illuminating perspective: the original objective can be understood as squared Bellman error minimization, and the corresponding optimization problem minimizes a bounded $\chi^2$-Divergence between the expert and the mixture distribution. This perspective allows us to address instabilities and properly treat absorbing states. We show that our method, Least Squares Inverse Q-Learning (LS-IQ), outperforms state-of-the-art algorithms, particularly in environments with absorbing states. Finally, we propose to use an inverse dynamics model to learn from observations only. Using this approach, we retain performance in settings where no expert actions are available.

Via

Access Paper or Ask Questions

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Sep 23, 2022

Oleg Arenz, Philipp Dahlinger, Zihan Ye, Michael Volpp, Gerhard Neumann

Figure 1 for A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Figure 2 for A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Figure 3 for A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Figure 4 for A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Abstract:Variational inference with Gaussian mixture models (GMMs) enables learning of highly-tractable yet multi-modal approximations of intractable target distributions. GMMs are particular relevant for problem settings with up to a few hundred dimensions, for example in robotics, for modelling distributions over trajectories or joint distributions. This work focuses on two very effective methods for GMM-based variational inference that both employ independent natural gradient updates for the individual components and the categorical distribution of the weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We perform extensive ablations on these design choices and show that they strongly affect the efficiency of the optimization and the variability of the learned distribution. Based on our insights, we propose a novel instantiation of our generalized framework, that combines first-order natural gradient estimates with trust-regions and component adaption, and significantly outperforms both previous methods in all our experiments.

Via

Access Paper or Ask Questions

Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Sep 12, 2022

Bang You, Jingming Xie, Youping Chen, Jan Peters, Oleg Arenz

Figure 1 for Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Figure 2 for Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Figure 3 for Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Figure 4 for Self-supervised Sequential Information Bottleneck for Robust Exploration in Deep Reinforcement Learning

Abstract:Effective exploration is critical for reinforcement learning agents in environments with sparse rewards or high-dimensional state-action spaces. Recent works based on state-visitation counts, curiosity and entropy-maximization generate intrinsic reward signals to motivate the agent to visit novel states for exploration. However, the agent can get distracted by perturbations to sensor inputs that contain novel but task-irrelevant information, e.g. due to sensor noise or changing background. In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations by modelling and compressing sequential predictive information in time-series observations. For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty based on the learned representations. We derive a variational upper bound of our sequential information bottleneck objective for practical optimization and provide an information-theoretic interpretation of the derived upper bound. Our experiments on a set of challenging image-based simulated control tasks show that our method achieves better sample efficiency, and robustness to both white noise and natural video backgrounds compared to state-of-art methods based on curiosity, entropy maximization and information-gain.

* 14 pages

Via

Access Paper or Ask Questions

Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

Mar 02, 2022

Bang You, Oleg Arenz, Youping Chen, Jan Peters

Figure 1 for Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

Figure 2 for Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

Figure 3 for Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

Figure 4 for Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

Abstract:Recent methods for reinforcement learning from images use auxiliary tasks to learn image features that are used by the agent's policy or Q-function. In particular, methods based on contrastive learning that induce linearity of the latent dynamics or invariance to data augmentation have been shown to greatly improve the sample efficiency of the reinforcement learning algorithm and the generalizability of the learned embedding. We further argue, that explicitly improving Markovianity of the learned embedding is desirable and propose a self-supervised representation learning method which integrates contrastive learning with dynamic models to synergistically combine these three objectives: (1) We maximize the InfoNCE bound on the mutual information between the state- and action-embedding and the embedding of the next state to induce a linearly predictive embedding without explicitly learning a linear transition model, (2) we further improve Markovianity of the learned embedding by explicitly learning a non-linear transition model using regression, and (3) we maximize the mutual information between the two nonlinear predictions of the next embeddings based on the current action and two independent augmentations of the current state, which naturally induces transformation invariance not only for the state embedding, but also for the nonlinear transition model. Experimental evaluation on the Deepmind control suite shows that our proposed method achieves higher sample efficiency and better generalization than state-of-art methods based on contrastive learning or reconstruction.

* Neurocomputing 476(2022)102-114
* 28 pages, 11 figures, 5 tables

Via

Access Paper or Ask Questions

Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Aug 12, 2020

Marco Ewerton, Oleg Arenz, Jan Peters

Figure 1 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Figure 2 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Figure 3 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Figure 4 for Assisted Teleoperation in Changing Environments with a Mixture of Virtual Guides

Abstract:Haptic guidance is a powerful technique to combine the strengths of humans and autonomous systems for teleoperation. The autonomous system can provide haptic cues to enable the operator to perform precise movements; the operator can interfere with the plan of the autonomous system leveraging his/her superior cognitive capabilities. However, providing haptic cues such that the individual strengths are not impaired is challenging because low forces provide little guidance, whereas strong forces can hinder the operator in realizing his/her plan. Based on variational inference, we learn a Gaussian mixture model (GMM) over trajectories to accomplish a given task. The learned GMM is used to construct a potential field which determines the haptic cues. The potential field smoothly changes during teleoperation based on our updated belief over the plans and their respective phases. Furthermore, new plans are learned online when the operator does not follow any of the proposed plans, or after changes in the environment. User studies confirm that our framework helps users perform teleoperation tasks more accurately than without haptic cues and, in some cases, faster. Moreover, we demonstrate the use of our framework to help a subject teleoperate a 7 DoF manipulator in a pick-and-place task.

* Advanced Robotics, 2020
* 19 pages, 9 figures

Via

Access Paper or Ask Questions

Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Aug 08, 2020

Oleg Arenz, Gerhard Neumann

Abstract:Many modern methods for imitation learning and inverse reinforcement learning, such as GAIL or AIRL, are based on an adversarial formulation. These methods apply GANs to match the expert's distribution over states and actions with the implicit state-action distribution induced by the agent's policy. However, by framing imitation learning as a saddle point problem, adversarial methods can suffer from unstable optimization, and convergence can only be shown for small policy updates. We address these problems by proposing a framework for non-adversarial imitation learning. The resulting algorithms are similar to their adversarial counterparts and, thus, provide insights for adversarial imitation learning methods. Most notably, we show that AIRL is an instance of our non-adversarial formulation, which enables us to greatly simplify its derivations and obtain stronger convergence guarantees. We also show that our non-adversarial formulation can be used to derive novel algorithms by presenting a method for offline imitation learning that is inspired by the recent ValueDice algorithm, but does not rely on small policy updates for convergence. In our simulated robot experiments, our offline method for non-adversarial imitation learning seems to perform best when using many updates for policy and discriminator at each iteration and outperforms behavioral cloning and ValueDice.

Via

Access Paper or Ask Questions