We present a novel bilateral negotiation model that allows a self-interested agent to learn how to negotiate over multiple issues in the presence of user preference uncertainty. The model relies upon interpretable strategy templates representing the tactics the agent should employ during the negotiation and learns template parameters to maximize the average utility received over multiple negotiations, thus resulting in optimal bid acceptance and generation. Our model also uses deep reinforcement learning to evaluate threshold utility values, for those tactics that require them, thereby deriving optimal utilities for every environment state. To handle user preference uncertainty, the model relies on a stochastic search to find user model that best agrees with a given partial preference profile. Multi-objective optimization and multi-criteria decision-making methods are applied at negotiation time to generate Pareto-optimal outcomes thereby increasing the number of successful (win-win) negotiations. Rigorous experimental evaluations show that the agent employing our model outperforms the winning agents of the 10th Automated Negotiating Agents Competition (ANAC'19) in terms of individual as well as social-welfare utilities.
Even though model predictive control (MPC) is currently the main algorithm for insulin control in the artificial pancreas (AP), it usually requires complex online optimizations, which are infeasible for resource-constrained medical devices. MPC also typically relies on state estimation, an error-prone process. In this paper, we introduce a novel approach to AP control that uses Imitation Learning to synthesize neural-network insulin policies from MPC-computed demonstrations. Such policies are computationally efficient and, by instrumenting MPC at training time with full state information, they can directly map measurements into optimal therapy decisions, thus bypassing state estimation. We apply Bayesian inference via Monte Carlo Dropout to learn policies, which allows us to quantify prediction uncertainty and thereby derive safer therapy decisions. We show that our control policies trained under a specific patient model readily generalize (in terms of model parameters and disturbance distributions) to patient cohorts, consistently outperforming traditional MPC with state estimation.
We present a novel negotiation model that allows an agent to learn how to negotiate during concurrent bilateral negotiations in unknown and dynamic e-markets. The agent uses an actor-critic architecture with model-free reinforcement learning to learn a strategy expressed as a deep neural network. We pre-train the strategy by supervision from synthetic market data, thereby decreasing the exploration time required for learning during negotiation. As a result, we can build automated agents for concurrent negotiations that can adapt to different e-market settings without the need to be pre-programmed. Our experimental evaluation shows that our deep reinforcement learning-based agents outperform two existing well-known negotiation strategies in one-to-many concurrent bilateral negotiations for a range of e-market settings.
We present the Neural Simplex Architecture (NSA), a new approach to runtime assurance that provides safety guarantees for neural controllers (obtained e.g. using reinforcement learning) of complex autonomous and other cyber-physical systems without unduly sacrificing performance. NSA is inspired by the Simplex control architecture of Sha et al., but with some significant differences. In the traditional Simplex approach, the advanced controller (AC) is treated as a black box; there are no techniques for correcting the AC after it generates a potentially unsafe control input that causes a failover to the BC. Our NSA addresses this limitation. NSA not only provides safety assurances for CPSs in the presence of a possibly faulty neural controller, but can also improve the safety of such a controller in an online setting via retraining, without degrading its performance. NSA also offers reverse switching strategies, which allow the AC to resume control of the system under reasonable conditions, allowing the mission to continue unabated. Our experimental results on several significant case studies, including a target-seeking ground rover navigating an obstacle field and a neural controller for an artificial pancreas system, demonstrate NSA's benefits.
We introduce a probabilistic robustness measure for Bayesian Neural Networks (BNNs), defined as the probability that, given a test point, there exists a point within a bounded set such that the BNN prediction differs between the two. Such a measure can be used, for instance, to quantify the probability of the existence of adversarial examples. Building on statistical verification techniques for probabilistic models, we develop a framework that allows us to estimate probabilistic robustness for a BNN with statistical guarantees, i.e., with a priori error and confidence bounds. We provide experimental comparison for several approximate BNN inference techniques on image classification tasks associated to MNIST and a two-class subset of the GTSRB dataset. Our results enable quantification of uncertainty of BNN predictions in adversarial settings.
We introduce the State Classification Problem (SCP) for hybrid systems, and present Neural State Classification (NSC) as an efficient solution technique. SCP generalizes the model checking problem as it entails classifying each state $s$ of a hybrid automaton as either positive or negative, depending on whether or not $s$ satisfies a given time-bounded reachability specification. This is an interesting problem in its own right, which NSC solves using machine-learning techniques, Deep Neural Networks in particular. State classifiers produced by NSC tend to be very efficient (run in constant time and space), but may be subject to classification errors. To quantify and mitigate such errors, our approach comprises: i) techniques for certifying, with statistical guarantees, that an NSC classifier meets given accuracy levels; ii) tuning techniques, including a novel technique based on adversarial sampling, that can virtually eliminate false negatives (positive states classified as negative), thereby making the classifier more conservative. We have applied NSC to six nonlinear hybrid system benchmarks, achieving an accuracy of 99.25% to 99.98%, and a false-negative rate of 0.0033 to 0, which we further reduced to 0.0015 to 0 after tuning the classifier. We believe that this level of accuracy is acceptable in many practical applications, and that these results demonstrate the promise of the NSC approach.
We show how machine-learning techniques, particularly neural networks, offer a very effective and highly efficient solution to the approximate model-checking problem for continuous and hybrid systems, a solution where the general-purpose model checker is replaced by a model-specific classifier trained by sampling model trajectories. To the best of our knowledge, we are the first to establish this link from machine learning to model checking. Our method comprises a pipeline of analysis techniques for estimating and obtaining statistical guarantees on the classifier's prediction performance, as well as tuning techniques to improve such performance. Our experimental evaluation considers the time-bounded reachability problem for three well-established benchmarks in the hybrid systems community. On these examples, we achieve an accuracy of 99.82% to 100% and a false-negative rate (incorrectly predicting that unsafe states are not reachable from a given state) of 0.0007 to 0. We believe that this level of accuracy is acceptable in many practical applications and we show how the approximate model checker can be made more conservative by tuning the classifier through further training and selection of the classification threshold.