In this work, we consider a sequence of stochastic optimization problems following a time-varying distribution via the lens of online optimization. Assuming that the loss function satisfies the Polyak-{\L}ojasiewicz condition, we apply online stochastic gradient descent and establish its dynamic regret bound that is composed of cumulative distribution drifts and cumulative gradient biases caused by stochasticity. The distribution metric we adopt here is Wasserstein distance, which is well-defined without the absolute continuity assumption or with a time-varying support set. We also establish a regret bound of online stochastic proximal gradient descent when the objective function is regularized. Moreover, we show that the above framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem. Particularly, we improve an existing proof on the discovery of the PL condition of the CVaR problem, resulting in a regret bound of online stochastic gradient descent.
This paper presents a Visual Inertial Odometry Landmark-based Simultaneous Localisation and Mapping algorithm based on a distributed block coordinate nonlinear Moving Horizon Estimation scheme. The main advantage of the proposed method is that the updates on the position of the landmarks are based on a Bundle Adjustment technique that can be parallelised over the landmarks. The performance of the method is demonstrated in simulations in different environments and with different types of robot trajectory. Circular and wiggling patterns in the trajectory lead to better estimation performance than straight ones, confirming what is expected from recent nonlinear observability theory.
In this paper, we propose and analyze a new method for online linear quadratic regulator (LQR) control with a priori unknown time-varying cost matrices. The cost matrices are revealed sequentially with the potential for future values to be previewed over a short window. Our novel method involves using the available cost matrices to predict the optimal trajectory, and a tracking controller to drive the system towards it. We adopted the notion of dynamic regret to measure the performance of this proposed online LQR control method, with our main result being that the (dynamic) regret of our method is upper bounded by a constant. Moreover, the regret upper bound decays exponentially with the preview window length, and is extendable to systems with disturbances. We show in simulations that our proposed method offers improved performance compared to other previously proposed online LQR methods.
In this paper we examine the relationship between the flow of the replicator dynamic, the continuum limit of Multiplicative Weights Update, and a game's response graph. We settle an open problem establishing that under the replicator, sink chain components -- a topological notion of long-run outcome of a dynamical system -- always exist and are approximated by the sink connected components of the game's response graph. More specifically, each sink chain component contains a sink connected component of the response graph, as well as all mixed strategy profiles whose support consists of pure profiles in the same connected component, a set we call the content of the connected component. As a corollary, all profiles are chain recurrent in games with strongly connected response graphs. In any two-player game sharing a response graph with a zero-sum game, the sink chain component is unique. In two-player zero-sum and potential games the sink chain components and sink connected components are in a one-to-one correspondence, and we conjecture that this holds in all games.
In this letter, we consider the problem of field estimation using binary measurements. Previous work has formulated the problem as a parameter estimation problem, with the parameter estimation carried out in an online manner using sequential Monte Carlo techniques. In the current work, we consider an alternative approach to the parameter estimation based on online logistic regression. The developed algorithm is less computationally intensive than the sequential Monte Carlo approach, while having more reliable estimation performance.
We propose a robust data-driven output feedback control algorithm that explicitly incorporates inherent finite-sample model estimate uncertainties into the control design. The algorithm has three components: (1) a subspace identification nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method comprising a coupled optimal dynamic output feedback filter and controller with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. Moreover, the control design method accommodates a highly structured uncertainty representation that can capture uncertainty shape more effectively than existing approaches. We show through numerical experiments that the proposed robust data-driven output feedback controller can significantly outperform a certainty equivalent controller on various measures of sample complexity and stability robustness.
With advances in image processing and machine learning, it is now feasible to incorporate semantic information into the problem of simultaneous localisation and mapping (SLAM). Previously, SLAM was carried out using lower level geometric features (points, lines, and planes) which are often view-point dependent and error prone in visually repetitive environments. Semantic information can improve the ability to recognise previously visited locations, as well as maintain sparser maps for long term SLAM applications. However, SLAM in repetitive environments has the critical problem of assigning measurements to the landmarks which generated them. In this paper, we use k-best assignment enumeration to compute marginal assignment probabilities for each measurement landmark pair, in real time. We present numerical studies on the KITTI dataset to demonstrate the effectiveness and speed of the proposed framework.
We consider safety in simultaneous learning and control of discrete-time linear time-invariant systems. We provide rigorous confidence bounds on the learned model of the system based on the number of utilized state measurements. These bounds are used to modify control inputs to the system via an optimization problem with potentially time-varying safety constraints. We prove that the state can only exit the safe set with small probability, provided a feasible solution to the safety-constrained optimization exists. This optimization problem is then reformulated in a more computationally-friendly format by tightening the safety constraints to account for model uncertainty during learning. The tightening decreases as the confidence in the learned model improves. We finally prove that, under persistence of excitation, the tightening becomes negligible as more measurements are gathered.
In this paper we provide a formal framework for comparing the expressive power of Behavior Trees (BTs) to other action selection architectures. Taking inspiration from the analogous comparisons of structural programming methodologies, we formalise the concept of `expressiveness'. This leads us to an expressiveness hierarchy of control architectures, which includes BTs, Decision Trees (DTs), Teleo-reactive Programs (TRs) and Finite State Machines (FSMs). By distinguishing between BTs with auxiliary variables and those without, we demonstrate the existence of a trade-off in BT design between readability and expressiveness. We discuss what this means for BTs in practice.
In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty. We model uncertainty as a Gaussian signal and use state measurements to learn its mean and covariance. We provide rigorous time-varying bounds on the mean and covariance of the uncertainty and employ them to modify the control input via an optimisation program with safety constraints encoded as a barrier function on the state space. We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously, provided that a feasible solution exists for the optimisation problem. We provide a secondary formulation of this optimisation that is computationally more efficient. This is based on tightening the safety constraints to counter the uncertainty about the learned mean and covariance. The magnitude of the tightening can be decreased as our confidence in the learned mean and covariance increases (i.e., as we gather more measurements about the environment). Extensions of the method are provided for Gaussian uncertainties with piecewise constant mean and covariance to accommodate more general environments.