We study the performance of the certainty equivalent controller on the Linear Quadratic Regulator (LQR) with unknown transition dynamics. We show that the sub-optimality gap between the cost incurred by playing the certainty equivalent controller on the true system and the cost incurred by using the optimal LQR controller enjoys a fast statistical rate, scaling as the square of the parameter error. Our result improves upon recent work by Dean et al. (2017), who present an algorithm achieving a sub-optimality gap linear in the parameter error. A key part of our analysis relies on perturbation bounds for discrete Riccati equations. We provide two new perturbation bounds, one that expands on an existing result from Konstantinov et al. (1993), and another based on a new elementary proof strategy. Our results show that certainty equivalent control with $\varepsilon$-greedy exploration achieves $\tilde{\mathcal{O}}(\sqrt{T})$ regret in the adaptive LQR setting, yielding the first guarantee of a computationally tractable algorithm that achieves nearly optimal regret for adaptive LQR.
The effectiveness of model-based versus model-free methods is a long-standing question in reinforcement learning (RL). Motivated by recent empirical success of RL on continuous control tasks, we study the sample complexity of popular model-based and model-free algorithms on the Linear Quadratic Regulator (LQR). We show that for policy evaluation, a simple model-based plugin method requires asymptotically less samples than the classical least-squares temporal difference (LSTD) estimator to reach the same quality of solution; the sample complexity gap between the two methods can be at least a factor of state dimension. For policy evaluation, we study a simple family of problem instances and show that nominal (certainty equivalence principle) control also requires a factor of state dimension fewer samples than the policy gradient method to reach the same level of control performance on these instances. Furthermore, the gap persists even when employing baselines commonly used in practice. To the best of our knowledge, this is the first theoretical result which demonstrates a separation in the sample complexity between model-based and model-free methods on a continuous control task.
The problem of estimating the $\mathcal{H}_\infty$-norm of an LTI system from noisy input/output measurements has attracted recent attention as an alternative to parameter identification for bounding unmodeled dynamics in robust control. In this paper, we study lower bounds for $\mathcal{H}_\infty$-norm estimation under a query model where at each iteration the algorithm chooses a bounded input signal and receives the response of the chosen signal corrupted by white noise. We prove that when the underlying system is an FIR filter, $\mathcal{H}_\infty$-norm estimation is no more efficient than model identification for passive sampling. For active sampling, we show that norm estimation is at most a factor of $\log{r}$ more sample efficient than model identification, where $r$ is the length of the filter. We complement our theoretical results with experiments which demonstrate that a simple non-adaptive estimator of the norm is competitive with state-of-the-art adaptive norm estimation algorithms.
We study the constrained linear quadratic regulator with unknown dynamics, addressing the tension between safety and exploration in data-driven control techniques. We present a framework which allows for system identification through persistent excitation, while maintaining safety by guaranteeing the satisfaction of state and input constraints. This framework involves a novel method for synthesizing robust constraint-satisfying feedback controllers, leveraging newly developed tools from system level synthesis. We connect statistical results with cost sub-optimality bounds to give non-asymptotic guarantees on both estimation and controller performance.
We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory. Our upper bound relies on a generalization of Mendelson's small-ball method to dependent data, eschewing the use of standard mixing-time arguments. Our lower bounds reveal that these upper bounds match up to logarithmic factors. In particular, we capture the correct signal-to-noise behavior of the problem, showing that more unstable linear systems are easier to estimate. This behavior is qualitatively different from arguments which rely on mixing-time calculations that suggest that unstable systems are more difficult to estimate. We generalize our technique to provide bounds for a more general class of linear response time-series.
We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.
We propose a new non-parametric framework for learning incrementally stable dynamical systems x' = f(x) from a set of sampled trajectories. We construct a rich family of smooth vector fields induced by certain classes of matrix-valued kernels, whose equilibria are placed exactly at a desired set of locations and whose local contraction and curvature properties at various points can be explicitly controlled using convex optimization. With curl-free kernels, our framework may also be viewed as a mechanism to learn potential fields and gradient flows. We develop large-scale techniques using randomized kernel approximations in this context. We demonstrate our approach, called contracting vector fields (CVF), on imitation learning tasks involving complex point-to-point human handwriting motions.
This paper addresses the optimal control problem known as the Linear Quadratic Regulator in the case when the dynamics are unknown. We propose a multi-stage procedure, called Coarse-ID control, that estimates a model from a few experimental trials, estimates the error in that model with respect to the truth, and then designs a controller using both the model and uncertainty estimate. Our technique uses contemporary tools from random matrix theory to bound the error in the estimation procedure. We also employ a recently developed approach to control synthesis called System Level Synthesis that enables robust control design by solving a convex optimization problem. We provide end-to-end bounds on the relative error in control cost that are nearly optimal in the number of parameters and that highlight salient properties of the system to be controlled such as closed-loop sensitivity and optimal control magnitude. We show experimentally that the Coarse-ID approach enables efficient computation of a stabilizing controller in regimes where simple control schemes that do not take the model uncertainty into account fail to stabilize the true system.
Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within $\varepsilon$-relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and Mendelson in the independent covariates setting. Finally, we provide experimental evidence indicating that our analysis correctly captures the qualitative behavior of LSTD on several LQR instances.
This work explores the trade-off between the number of samples required to accurately build models of dynamical systems and the degradation of performance in various control objectives due to a coarse approximation. In particular, we show that simple models can be easily fit from input/output data and are sufficient for achieving various control objectives. We derive bounds on the number of noisy input/output samples from a stable linear time-invariant system that are sufficient to guarantee that the corresponding finite impulse response approximation is close to the true system in the $\mathcal{H}_\infty$-norm. We demonstrate that these demands are lower than those derived in prior art which aimed to accurately identify dynamical models. We also explore how different physical input constraints, such as power constraints, affect the sample complexity. Finally, we show how our analysis fits within the established framework of robust control, by demonstrating how a controller designed for an approximate system provably meets performance objectives on the true system.