Abstract:In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and performance over standard deterministic actor-critic baselines.
Abstract:Obtaining the solution of constrained optimization problems as a function of parameters is very important in a multitude of applications, such as control and planning. Solving such parametric optimization problems in real time can present significant challenges, particularly when it is necessary to obtain highly accurate solutions or batches of solutions. To solve these challenges, we propose a learning-based iterative solver for constrained optimization which can obtain very fast and accurate solutions by customizing the solver to a specific parametric optimization problem. For a given set of parameters of the constrained optimization problem, we propose a first step with a neural network predictor that outputs primal-dual solutions of a reasonable degree of accuracy. This primal-dual solution is then improved to a very high degree of accuracy in a second step by a learned iterative solver in the form of a neural network. A novel loss function based on the Karush-Kuhn-Tucker conditions of optimality is introduced, enabling fully self-supervised training of both neural networks without the necessity of prior sampling of optimizer solutions. The evaluation of a variety of quadratic and nonlinear parametric test problems demonstrates that the predictor alone is already competitive with recent self-supervised schemes for approximating optimal solutions. The second step of our proposed learning-based iterative constrained optimizer achieves solutions with orders of magnitude better accuracy than other learning-based approaches, while being faster to evaluate than state-of-the-art solvers and natively allowing for GPU parallelization.
Abstract:Model predictive control can optimally deal with nonlinear systems under consideration of constraints. The control performance depends on the model accuracy and the prediction horizon. Recent advances propose to use reinforcement learning applied to a parameterized model predictive controller to recover the optimal control performance even if an imperfect model or short prediction horizons are used. However, common reinforcement learning algorithms rely on first order updates, which only have a linear convergence rate and hence need an excessive amount of dynamic data. Higher order updates are typically intractable if the policy is approximated with neural networks due to the large number of parameters. In this work, we use a parameterized model predictive controller as policy, and leverage the small amount of necessary parameters to propose a trust-region constrained Quasi-Newton training algorithm for policy optimization with a superlinear convergence rate. We show that the required second order derivative information can be calculated by the solution of a linear system of equations. A simulation study illustrates that the proposed training algorithm outperforms other algorithms in terms of data efficiency and accuracy.




Abstract:Uncertainty quantification is an essential task in machine learning - a task in which neural networks (NNs) have traditionally not excelled. Bayesian neural networks (BNNs), in which parameters and predictions are probability distributions, can be a remedy for some applications, but often require expensive sampling for training and inference. NNs with Bayesian last layer (BLL) are simplified BNNs where only the weights in the last layer and the predictions follow a normal distribution. They are conceptually related to Bayesian linear regression (BLR) which has recently gained popularity in learning based-control under uncertainty. Both consider a non-linear feature space which is linearly mapped to the output, and hyperparameters, for example the noise variance, For NNs with BLL, these hyperparameters should include the deterministic weights of all other layers, as these impact the feature space and thus the predictive performance. Unfortunately, the marginal likelihood is expensive to evaluate in this setting and prohibits direct training through back-propagation. In this work, we present a reformulation of the BLL log-marginal likelihood, which considers weights in previous layers as hyperparameters and allows for efficient training through back-propagation. Furthermore, we derive a simple method to improve the extrapolation uncertainty of NNs with BLL. In a multivariate toy example and in the case of a dynamic system identification task, we show that NNs with BLL, trained with our proposed algorithm, outperform standard BLR with NN features.