Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dominik Baumann

Safe Bayesian optimization across noise models via scenario programming

Dec 12, 2025

Abdullah Tokmak, Thomas B. Schön, Dominik Baumann

Abstract:Safe Bayesian optimization (BO) with Gaussian processes is an effective tool for tuning control policies in safety-critical real-world systems, specifically due to its sample efficiency and safety guarantees. However, most safe BO algorithms assume homoscedastic sub-Gaussian measurement noise, an assumption that does not hold in many relevant applications. In this article, we propose a straightforward yet rigorous approach for safe BO across noise models, including homoscedastic sub-Gaussian and heteroscedastic heavy-tailed distributions. We provide a high-probability bound on the measurement noise via the scenario approach, integrate these bounds into high probability confidence intervals, and prove safety and optimality for our proposed safe BO algorithm. We deploy our algorithm in synthetic examples and in tuning a controller for the Franka Emika manipulator in simulation.

* Accepted for publication (IEEE Control System Letters)

Via

Access Paper or Ask Questions

Online Bayesian Experimental Design for Partially Observed Dynamical Systems

Nov 06, 2025

Sara Pérez-Vieites, Sahel Iqbal, Simo Särkkä, Dominik Baumann

Figure 1 for Online Bayesian Experimental Design for Partially Observed Dynamical Systems

Figure 2 for Online Bayesian Experimental Design for Partially Observed Dynamical Systems

Figure 3 for Online Bayesian Experimental Design for Partially Observed Dynamical Systems

Figure 4 for Online Bayesian Experimental Design for Partially Observed Dynamical Systems

Abstract:Bayesian experimental design (BED) provides a principled framework for optimizing data collection, but existing approaches do not apply to crucial real-world settings such as dynamical systems with partial observability, where only noisy and incomplete observations are available. These systems are naturally modeled as state-space models (SSMs), where latent states mediate the link between parameters and data, making the likelihood -- and thus information-theoretic objectives like the expected information gain (EIG) -- intractable. In addition, the dynamical nature of the system requires online algorithms that update posterior distributions and select designs sequentially in a computationally efficient manner. We address these challenges by deriving new estimators of the EIG and its gradient that explicitly marginalize latent states, enabling scalable stochastic optimization in nonlinear SSMs. Our approach leverages nested particle filters (NPFs) for efficient online inference with convergence guarantees. Applications to realistic models, such as the susceptible-infected-recovered (SIR) and a moving source location task, show that our framework successfully handles both partial observability and online computation.

* 19 pages, 5 figures

Via

Access Paper or Ask Questions

Efficient Human-Aware Task Allocation for Multi-Robot Systems in Shared Environments

Aug 27, 2025

Maryam Kazemi Eskeri, Ville Kyrki, Dominik Baumann, Tomasz Piotr Kucner

Abstract:Multi-robot systems are increasingly deployed in applications, such as intralogistics or autonomous delivery, where multiple robots collaborate to complete tasks efficiently. One of the key factors enabling their efficient cooperation is Multi-Robot Task Allocation (MRTA). Algorithms solving this problem optimize task distribution among robots to minimize the overall execution time. In shared environments, apart from the relative distance between the robots and the tasks, the execution time is also significantly impacted by the delay caused by navigating around moving people. However, most existing MRTA approaches are dynamics-agnostic, relying on static maps and neglecting human motion patterns, leading to inefficiencies and delays. In this paper, we introduce \acrfull{method name}. This method leverages Maps of Dynamics (MoDs), spatio-temporal queryable models designed to capture historical human movement patterns, to estimate the impact of humans on the task execution time during deployment. \acrshort{method name} utilizes a stochastic cost function that includes MoDs. Experimental results show that integrating MoDs enhances task allocation performance, resulting in reduced mission completion times by up to $26\%$ compared to the dynamics-agnostic method and up to $19\%$ compared to the baseline. This work underscores the importance of considering human dynamics in MRTA within shared environments and presents an efficient framework for deploying multi-robot systems in environments populated by humans.

* 7 Pages, 4 Figures, Accepted in IROS2025

Via

Access Paper or Ask Questions

A Lightweight Crowd Model for Robot Social Navigation

Aug 27, 2025

Maryam Kazemi Eskeri, Thomas Wiedemann, Ville Kyrki, Dominik Baumann, Tomasz Piotr Kucner

Abstract:Robots operating in human-populated environments must navigate safely and efficiently while minimizing social disruption. Achieving this requires estimating crowd movement to avoid congested areas in real-time. Traditional microscopic models struggle to scale in dense crowds due to high computational cost, while existing macroscopic crowd prediction models tend to be either overly simplistic or computationally intensive. In this work, we propose a lightweight, real-time macroscopic crowd prediction model tailored for human motion, which balances prediction accuracy and computational efficiency. Our approach simplifies both spatial and temporal processing based on the inherent characteristics of pedestrian flow, enabling robust generalization without the overhead of complex architectures. We demonstrate a 3.6 times reduction in inference time, while improving prediction accuracy by 3.1 %. Integrated into a socially aware planning framework, the model enables efficient and socially compliant robot navigation in dynamic environments. This work highlights that efficient human crowd modeling enables robots to navigate dense environments without costly computations.

* 7 pages, 6 figures, accepted in ECMR 2025

Via

Access Paper or Ask Questions

Safety and optimality in learning-based control at low computational cost

May 12, 2025

Dominik Baumann, Krzysztof Kowalczyk, Cristian R. Rojas, Koen Tiels, Pawel Wachel

Abstract:Applying machine learning methods to physical systems that are supposed to act in the real world requires providing safety guarantees. However, methods that include such guarantees often come at a high computational cost, making them inapplicable to large datasets and embedded devices with low computational power. In this paper, we propose CoLSafe, a computationally lightweight safe learning algorithm whose computational complexity grows sublinearly with the number of data points. We derive both safety and optimality guarantees and showcase the effectiveness of our algorithm on a seven-degrees-of-freedom robot arm.

* Accepted final version to appear in the IEEE Transactions on Automatic Control

Via

Access Paper or Ask Questions

Safe exploration in reproducing kernel Hilbert spaces

Mar 13, 2025

Abdullah Tokmak, Kiran G. Krishnan, Thomas B. Schön, Dominik Baumann

Abstract:Popular safe Bayesian optimization (BO) algorithms learn control policies for safety-critical systems in unknown environments. However, most algorithms make a smoothness assumption, which is encoded by a known bounded norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it remains unclear how to reliably obtain the RKHS norm of an unknown function. In this work, we propose a safe BO algorithm capable of estimating the RKHS norm from data. We provide statistical guarantees on the RKHS norm estimation, integrate the estimated RKHS norm into existing confidence intervals and show that we retain theoretical guarantees, and prove safety of the resulting safe BO algorithm. We apply our algorithm to safely optimize reinforcement learning policies on physics simulators and on a real inverted pendulum, demonstrating improved performance, safety, and scalability compared to the state-of-the-art.

* Accepted to AISTATS 2025

Via

Access Paper or Ask Questions

Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

Feb 27, 2025

Mingwei Deng, Ville Kyrki, Dominik Baumann

Figure 1 for Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

Figure 2 for Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

Figure 3 for Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

Figure 4 for Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal Transportability

Abstract:Transferring knowledge from one environment to another is an essential ability of intelligent systems. Nevertheless, when two environments are different, naively transferring all knowledge may deteriorate the performance, a phenomenon known as negative transfer. In this paper, we address this issue within the framework of multi-armed bandits from the perspective of causal inference. Specifically, we consider transfer learning in latent contextual bandits, where the actual context is hidden, but a potentially high-dimensional proxy is observable. We further consider a covariate shift in the context across environments. We show that naively transferring all knowledge for classical bandit algorithms in this setting led to negative transfer. We then leverage transportability theory from causal inference to develop algorithms that explicitly transfer effective knowledge for estimating the causal effects of interest in the target environment. Besides, we utilize variational autoencoders to approximate causal effects under the presence of a high-dimensional proxy. We test our algorithms on synthetic and semi-synthetic datasets, empirically demonstrating consistently improved learning efficiency across different proxies compared to baseline algorithms, showing the effectiveness of our causal framework in transferring knowledge.

* Accepted at the Conference of Causal Learning and Reasoning (CLeaR 2025), will be published in the Proceedings of Machine Learning Research

Via

Access Paper or Ask Questions

Simulation-Aided Policy Tuning for Black-Box Robot Learning

Nov 21, 2024

Shiming He, Alexander von Rohr, Dominik Baumann, Ji Xiang, Sebastian Trimpe

Figure 1 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Figure 2 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Figure 3 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Figure 4 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Abstract:How can robots learn and adapt to new tasks and situations with little data? Systematic exploration and simulation are crucial tools for efficient robot learning. We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. At the core of the algorithm, a probabilistic model learns the dependence of the policy parameters and the robot learning objective not only by performing experiments on the robot, but also by leveraging data from a simulator. This substantially reduces interaction time with the robot. Using this model, we can guarantee improvements with high probability for each policy update, thereby facilitating fast, goal-oriented learning. We evaluate our algorithm on simulated fine-tuning tasks and demonstrate the data-efficiency of the proposed dual-information source optimization algorithm. In a real robot learning experiment, we show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator.

Via

Access Paper or Ask Questions

PACSBO: Probably approximately correct safe Bayesian optimization

Sep 02, 2024

Abdullah Tokmak, Thomas B. Schön, Dominik Baumann

Abstract:Safe Bayesian optimization (BO) algorithms promise to find optimal control policies without knowing the system dynamics while at the same time guaranteeing safety with high probability. In exchange for those guarantees, popular algorithms require a smoothness assumption: a known upper bound on a norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it is unclear how to, in practice, obtain an upper bound of an unknown function in its corresponding RKHS. In response, we propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data and investigate its theoretical properties. Moreover, akin to Lipschitz-based methods, we treat the RKHS norm as a local rather than a global object, and thus reduce conservatism. Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO, an algorithm for probably approximately correct safe Bayesian optimization, for which we provide numerical and hardware experiments that demonstrate its applicability and benefits over popular safe BO algorithms.

* Accepted to the Symposium on Systems Theory in Data and Optimization (SysDO 2024). This is a preprint of the final version, which is to appear in Lecture Notes in Control and Information Sciences - Proceedings

Via

Access Paper or Ask Questions

Safe reinforcement learning in uncertain contexts

Jan 11, 2024

Dominik Baumann, Thomas B. Schön

Figure 1 for Safe reinforcement learning in uncertain contexts

Figure 2 for Safe reinforcement learning in uncertain contexts

Figure 3 for Safe reinforcement learning in uncertain contexts

Figure 4 for Safe reinforcement learning in uncertain contexts

Abstract:When deploying machine learning algorithms in the real world, guaranteeing safety is an essential asset. Existing safe learning approaches typically consider continuous variables, i.e., regression tasks. However, in practice, robotic systems are also subject to discrete, external environmental changes, e.g., having to carry objects of certain weights or operating on frozen, wet, or dry surfaces. Such influences can be modeled as discrete context variables. In the existing literature, such contexts are, if considered, mostly assumed to be known. In this work, we drop this assumption and show how we can perform safe learning when we cannot directly measure the context variables. To achieve this, we derive frequentist guarantees for multi-class classification, allowing us to estimate the current context from measurements. Further, we propose an approach for identifying contexts through experiments. We discuss under which conditions we can retain theoretical guarantees and demonstrate the applicability of our algorithm on a Furuta pendulum with camera measurements of different weights that serve as contexts.

* Accepted final version to appear in the IEEE Transactions on Robotics

Via

Access Paper or Ask Questions