Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephen Tu

Regret Bounds for Adaptive Nonlinear Control

Nov 26, 2020
Nicholas M. Boffi, Stephen Tu, Jean-Jacques E. Slotine

Figure 1 for Regret Bounds for Adaptive Nonlinear Control

We study the problem of adaptively controlling a known discrete-time nonlinear system subject to unmodeled disturbances. We prove the first finite-time regret bounds for adaptive nonlinear control with matched uncertainty in the stochastic setting, showing that the regret suffered by certainty equivalence adaptive control, compared to an oracle controller with perfect knowledge of the unmodeled disturbances, is upper bounded by $\widetilde{O}(\sqrt{T})$ in expectation. Furthermore, we show that when the input is subject to a $k$ timestep delay, the regret degrades to $\widetilde{O}(k \sqrt{T})$. Our analysis draws connections between classical stability notions in nonlinear control theory (Lyapunov stability and contraction theory) and modern regret analysis from online convex optimization. The use of stability theory allows us to analyze the challenging infinite-horizon single trajectory setting.

Via

Access Paper or Ask Questions

Safely Learning Dynamical Systems from Short Trajectories

Nov 24, 2020
Amir Ali Ahmadi, Abraar Chaudhry, Vikas Sindhwani, Stephen Tu

Figure 1 for Safely Learning Dynamical Systems from Short Trajectories

Figure 2 for Safely Learning Dynamical Systems from Short Trajectories

Figure 3 for Safely Learning Dynamical Systems from Short Trajectories

A fundamental challenge in learning to control an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. In this work, we formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize the next trajectory. In our framework, the state of the system is required to stay within a given safety region under the (possibly repeated) action of all dynamical systems that are consistent with the information gathered so far. For our first two results, we consider the setting of safely learning linear dynamics. We present a linear programming-based algorithm that either safely recovers the true dynamics from trajectories of length one, or certifies that safe learning is impossible. We also give an efficient semidefinite representation of the set of initial conditions whose resulting trajectories of length two are guaranteed to stay in the safety region. For our final result, we study the problem of safely learning a nonlinear dynamical system. We give a second-order cone programming based representation of the set of initial conditions that are guaranteed to remain in the safety region after one application of the system dynamics.

Via

Access Paper or Ask Questions

Learning Hybrid Control Barrier Functions from Data

Nov 08, 2020
Lars Lindemann, Haimin Hu, Alexander Robey, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

Figure 1 for Learning Hybrid Control Barrier Functions from Data

Figure 2 for Learning Hybrid Control Barrier Functions from Data

Figure 3 for Learning Hybrid Control Barrier Functions from Data

Figure 4 for Learning Hybrid Control Barrier Functions from Data

Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data. In particular, we assume a setting in which the system dynamics are known and in which data exhibiting safe system behavior is available. We propose hybrid control barrier functions for hybrid systems as a means to synthesize safe control inputs. Based on this notion, we present an optimization-based framework to learn such hybrid control barrier functions from data. Importantly, we identify sufficient conditions on the data such that feasibility of the optimization problem ensures correctness of the learned hybrid control barrier functions, and hence the safety of the system. We illustrate our findings in two simulations studies, including a compass gait walker.

* 27 pages, Conference on Robot Learning 2020

Via

Access Paper or Ask Questions

Learning Stability Certificates from Data

Sep 14, 2020
Nicholas M. Boffi, Stephen Tu, Nikolai Matni, Jean-Jacques E. Slotine, Vikas Sindhwani

Figure 1 for Learning Stability Certificates from Data

Figure 2 for Learning Stability Certificates from Data

Figure 3 for Learning Stability Certificates from Data

Many existing tools in nonlinear control theory for establishing stability or safety of a dynamical system can be distilled to the construction of a certificate function that guarantees a desired property. However, algorithms for synthesizing certificate functions typically require a closed-form analytical expression of the underlying dynamics, which rules out their use on many modern robotic platforms. To circumvent this issue, we develop algorithms for learning certificate functions only from trajectory data. We establish bounds on the generalization error - the probability that a certificate will not certify a new, unseen trajectory - when learning from trajectories, and we convert such generalization error bounds into global stability guarantees. We demonstrate empirically that certificates for complex dynamics can be efficiently learned, and that the learned certificates can be used for downstream tasks such as adaptive control.

* Fixes an error in the statement and proof of Theorem 5.1, Theorem 5.2, and Proposition D.1

Via

Access Paper or Ask Questions

Learning Control Barrier Functions from Expert Demonstrations

Apr 07, 2020
Alexander Robey, Haimin Hu, Lars Lindemann, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

Figure 1 for Learning Control Barrier Functions from Expert Demonstrations

Figure 2 for Learning Control Barrier Functions from Expert Demonstrations

Figure 3 for Learning Control Barrier Functions from Expert Demonstrations

Figure 4 for Learning Control Barrier Functions from Expert Demonstrations

Inspired by the success of imitation and inverse reinforcement learning in replicating expert behavior through optimal control, we propose a learning based approach to safe controller synthesis based on control barrier functions (CBFs). We consider the setting of a known nonlinear control affine dynamical system and assume that we have access to safe trajectories generated by an expert - a practical example of such a setting would be a kinematic model of a self-driving vehicle with safe trajectories (e.g. trajectories that avoid collisions with obstacles in the environment) generated by a human driver. We then propose and analyze an optimization-based approach to learning a CBF that enjoys provable safety guarantees under suitable Lipschitz smoothness assumptions on the underlying dynamical system. A strength of our approach is that it is agnostic to the parameterization used to represent the CBF, assuming only that the Lipschitz constant of such functions can be efficiently bounded. Furthermore, if the CBF parameterization is convex, then under mild assumptions, so is our learning process. We end with extensive numerical evaluations of our results on both planar and realistic examples, using both random feature and deep neural network parameterizations of the CBF. To the best of our knowledge, these are the first results that learn provably safe control barrier functions from data.

Via

Access Paper or Ask Questions

Observational Overfitting in Reinforcement Learning

Dec 28, 2019
Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur

Figure 1 for Observational Overfitting in Reinforcement Learning

Figure 2 for Observational Overfitting in Reinforcement Learning

Figure 3 for Observational Overfitting in Reinforcement Learning

Figure 4 for Observational Overfitting in Reinforcement Learning

A major component of overfitting in model-free reinforcement learning (RL) involves the case where the agent may mistakenly correlate reward with certain spurious features from the observations generated by the Markov Decision Process (MDP). We provide a general framework for analyzing this scenario, which we use to design multiple synthetic benchmarks from only modifying the observation space of an MDP. When an agent overfits to different observation spaces even if the underlying MDP dynamics is fixed, we term this observational overfitting. Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).

* Published as a conference paper in ICLR 2020

Via

Access Paper or Ask Questions

A Tutorial on Concentration Bounds for System Identification

Jun 27, 2019
Nikolai Matni, Stephen Tu

Figure 1 for A Tutorial on Concentration Bounds for System Identification

We provide a brief tutorial on the use of concentration inequalities as they apply to system identification of state-space parameters of linear time invariant systems, with a focus on the fully observed setting. We draw upon tools from the theories of large-deviations and self-normalized martingales, and provide both data-dependent and independent bounds on the learning rate.

* Tutorial paper submitted to 2020 IEEE Conference on Decision and Control

Via

Access Paper or Ask Questions

From self-tuning regulators to reinforcement learning and back again

Jun 27, 2019
Nikolai Matni, Alexandre Proutiere, Anders Rantzer, Stephen Tu

Figure 1 for From self-tuning regulators to reinforcement learning and back again

Figure 2 for From self-tuning regulators to reinforcement learning and back again

Figure 3 for From self-tuning regulators to reinforcement learning and back again

Machine and reinforcement learning (RL) are being applied to plan and control the behavior of autonomous systems interacting with the physical world -- examples include self-driving vehicles, distributed sensor networks, and agile robots. However, if machine learning is to be applied in these new settings, the resulting algorithms must come with the reliability, robustness, and safety guarantees that are hallmarks of the control theory literature, as failures could be catastrophic. Thus, as RL algorithms are increasingly and more aggressively deployed in safety critical settings, it is imperative that control theorists be part of the conversation. The goal of this tutorial paper is to provide a jumping off point for control theorists wishing to work on RL related problems by covering recent advances in bridging learning and control theory, and by placing these results within the appropriate historical context of the system identification and adaptive control literatures.

* Tutorial paper submitted to 2020 IEEE Conference on Decision and Control

Via

Access Paper or Ask Questions

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

May 30, 2019
Karl Krauth, Stephen Tu, Benjamin Recht

Figure 1 for Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

Figure 2 for Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity. Specifically, we show that to obtain a controller that is within $\varepsilon$ of the optimal LQR controller, each step of policy evaluation requires at most $(n+d)^3/\varepsilon^2$ samples, where $n$ is the dimension of the state vector and $d$ is the dimension of the input vector. On the other hand, only $\log(1/\varepsilon)$ policy improvement steps suffice, resulting in an overall sample complexity of $(n+d)^3 \varepsilon^{-2} \log(1/\varepsilon)$. We furthermore build on our analysis and construct a simple adaptive procedure based on $\varepsilon$-greedy exploration which relies on approximate PI as a sub-routine and obtains $T^{2/3}$ regret, improving upon a recent result of Abbasi-Yadkori et al.

Via

Access Paper or Ask Questions