Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nils Jansen

Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Dec 10, 2022
Dennis Gross, Thiago D. Simao, Nils Jansen, Guillermo A. Perez

Figure 1 for Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Figure 2 for Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Figure 3 for Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Figure 4 for Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks.

* ICAART 2023 Paper (Technical Report)

Via

Access Paper or Ask Questions

Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Dec 01, 2022
Luke Rickard, Thom Badings, Licio Romao, Nils Jansen, Alessandro Abate

Figure 1 for Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Figure 2 for Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Figure 3 for Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Figure 4 for Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics

Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deploying these systems in safety-critical scenarios. However, their hybrid features and stochastic or unknown behaviours make this synthesis problem challenging. In this paper, we propose a method for synthesizing controllers for Markov jump linear systems (MJLSs), a particular class of cyber-physical systems, that certifiably satisfy a requirement expressed as a specification in probabilistic computation tree logic (PCTL). An MJLS consists of a finite set of linear dynamics with unknown additive disturbances, where jumps between these modes are governed by a Markov decision process (MDP). We consider both the case where the transition function of this MDP is given by probability intervals or where it is completely unknown. Our approach is based on generating a finite-state abstraction which captures both the discrete and the continuous behaviour of the original system. We formalise such abstraction as an interval Markov decision process (iMDP): intervals of transition probabilities are computed using sampling techniques from the so-called "scenario approach", resulting in a probabilistically sound approximation of the MJLS. This iMDP abstracts both the jump dynamics between modes, as well as the continuous dynamics within the modes. To demonstrate the efficacy of our technique, we apply our method to multiple realistic benchmark problems, in particular, temperature control, and aerial vehicle delivery problems.

* 10 pages, 4 figures, to be submitted to L4DC

Via

Access Paper or Ask Questions

Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Oct 12, 2022
Thom Badings, Licio Romao, Alessandro Abate, Nils Jansen

Figure 1 for Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Figure 2 for Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Figure 3 for Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Figure 4 for Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty

Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters and the presence of external disturbances lead to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models exclusively capture aleatoric but not epistemic uncertainty, and thus require that model parameters and disturbances are known precisely. Our contribution to overcoming this restriction is a novel abstraction-based controller synthesis method for continuous-state models with stochastic noise, uncertain parameters, and external disturbances. By sampling techniques and robust analysis, we capture both aleatoric and epistemic uncertainty, with a user-specified confidence level, in the transition probability intervals of a so-called interval Markov decision process (iMDP). We then synthesize an optimal policy on this abstract iMDP, which translates (with the specified confidence level) to a feedback controller for the continuous model, with the same performance guarantees. Our experimental benchmarks confirm that accounting for epistemic uncertainty leads to controllers that are more robust against variations in parameter values.

Via

Access Paper or Ask Questions

Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Oct 02, 2022
Yannick Hogewind, Thiago D. Simao, Tal Kachman, Nils Jansen

Figure 1 for Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Figure 2 for Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Figure 3 for Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

Figure 4 for Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

We address the problem of safe reinforcement learning from pixel observations. Inherent challenges in such settings are (1) a trade-off between reward optimization and adhering to safety constraints, (2) partial observability, and (3) high-dimensional observations. We formalize the problem in a constrained, partially observable Markov decision process framework, where an agent obtains distinct reward and safety signals. To address the curse of dimensionality, we employ a novel safety critic using the stochastic latent actor-critic (SLAC) approach. The latent variable model predicts rewards and safety violations, and we use the safety critic to train safe policies. Using well-known benchmark environments, we demonstrate competitive performance over existing approaches with respects to computational requirements, final reward return, and satisfying the safety constraints.

Via

Access Paper or Ask Questions

COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

Sep 15, 2022
Dennis Gross, Nils Jansen, Sebastian Junges, Guillermo A. Perez

Figure 1 for COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments.

Via

Access Paper or Ask Questions

A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Aug 01, 2022
Zaharah A. Bukhsh, Nils Jansen, Hajo Molegraaf

Figure 1 for A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Figure 2 for A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Figure 3 for A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Figure 4 for A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Cost-effective asset management is an area of interest across several industries. Specifically, this paper develops a deep reinforcement learning (DRL) solution to automatically determine an optimal rehabilitation policy for continuously deteriorating water pipes. We approach the problem of rehabilitation planning in an online and offline DRL setting. In online DRL, the agent interacts with a simulated environment of multiple pipes with distinct length, material, and failure rate characteristics. We train the agent using deep Q-learning (DQN) to learn an optimal policy with minimal average costs and reduced failure probability. In offline learning, the agent uses static data, e.g., DQN replay data, to learn an optimal policy via a conservative Q-learning algorithm without further interactions with the environment. We demonstrate that DRL-based policies improve over standard preventive, corrective, and greedy planning alternatives. Additionally, learning from the fixed DQN replay dataset surpasses the online DQN setting. The results warrant that the existing deterioration profiles of water pipes consisting of large and diverse states and action trajectories provide a valuable avenue to learn rehabilitation policies in the offline setting without needing a simulator.

* Underreview, 12 pages, 8 Figure

Via

Access Paper or Ask Questions

Robust Anytime Learning of Markov Decision Processes

May 31, 2022
Marnix Suilen, Thiago D. Simão, Nils Jansen, David Parker

Figure 1 for Robust Anytime Learning of Markov Decision Processes

Figure 2 for Robust Anytime Learning of Markov Decision Processes

Figure 3 for Robust Anytime Learning of Markov Decision Processes

Figure 4 for Robust Anytime Learning of Markov Decision Processes

Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks.

Via

Access Paper or Ask Questions

Safe Reinforcement Learning via Shielding for POMDPs

Apr 02, 2022
Steven Carr, Nils Jansen, Sebastian Junges, Ufuk Topcu

Figure 1 for Safe Reinforcement Learning via Shielding for POMDPs

Figure 2 for Safe Reinforcement Learning via Shielding for POMDPs

Figure 3 for Safe Reinforcement Learning via Shielding for POMDPs

Figure 4 for Safe Reinforcement Learning via Shielding for POMDPs

Reinforcement learning (RL) in safety-critical environments requires an agent to avoid decisions with catastrophic consequences. Various approaches addressing the safety of RL exist to mitigate this problem. In particular, so-called shields provide formal safety guarantees on the behavior of RL agents based on (partial) models of the agents' environment. Yet, the state-of-the-art generally assumes perfect sensing capabilities of the agents, which is unrealistic in real-life applications. The standard models to capture scenarios with limited sensing are partially observable Markov decision processes (POMDPs). Safe RL for these models remains an open problem so far. We propose and thoroughly evaluate a tight integration of formally-verified shields for POMDPs with state-of-the-art deep RL algorithms and create an efficacious method that safely learns policies under partial observability. We empirically demonstrate that an RL agent using a shield, beyond being safe, converges to higher values of expected reward. Moreover, shielded agents need an order of magnitude fewer training episodes than unshielded agents, especially in challenging sparse-reward settings.

* 15 pages, 15 Figures, 6 Tables

Via

Access Paper or Ask Questions

Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise

Nov 13, 2021
Thom S. Badings, Alessandro Abate, Nils Jansen, David Parker, Hasan A. Poonawala, Marielle Stoelinga

Figure 1 for Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise

Figure 2 for Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise

Figure 3 for Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise

Figure 4 for Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise

Controllers for autonomous systems that operate in safety-critical settings must account for stochastic disturbances. Such disturbances are often modelled as process noise, and common assumptions are that the underlying distributions are known and/or Gaussian. In practice, however, these assumptions may be unrealistic and can lead to poor approximations of the true noise distribution. We present a novel planning method that does not rely on any explicit representation of the noise distributions. In particular, we address the problem of computing a controller that provides probabilistic guarantees on safely reaching a target. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP, and compute a controller for which these guarantees carry over to the autonomous system. Realistic benchmarks show the practical applicability of our method, even when the iMDP has millions of states or transitions.

Via

Access Paper or Ask Questions