Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Abstract:Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with H\"{o}lder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.

Via

Abstract:We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with $\mathrm{s}$-rectangular ambiguity sets. The developed method, named first-order policy evaluation (FRPE), provides the first unified framework for robust policy evaluation in both deterministic (offline) and stochastic (online) settings, with either tabular representation or generic function approximation. In particular, we establish linear convergence in the deterministic setting, and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexity in the stochastic setting. FRPE also extends naturally to evaluating the robust state-action value function with $(\mathrm{s}, \mathrm{a})$-rectangular ambiguity sets. We discuss the application of the developed results for stochastic policy optimization of large-scale robust MDPs.

Via

Figures and Tables:

Abstract:We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.

Via

Abstract:Optimization problems involving sequential decisions in a stochastic environment were studied in Stochastic Programming (SP), Stochastic Optimal Control (SOC) and Markov Decision Processes (MDP). In this paper we mainly concentrate on SP and SOC modelling approaches. In these frameworks there are natural situations when the considered problems are convex. Classical approach to sequential optimization is based on dynamic programming. It has the problem of the so-called ``Curse of Dimensionality", in that its computational complexity increases exponentially with increase of dimension of state variables. Recent progress in solving convex multistage stochastic problems is based on cutting planes approximations of the cost-to-go (value) functions of dynamic programming equations. Cutting planes type algorithms in dynamical settings is one of the main topics of this paper. We also discuss Stochastic Approximation type methods applied to multistage stochastic optimization problems. From the computational complexity point of view, these two types of methods seem to be complimentary to each other. Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables. On the other hand, stochastic approximation type methods can only deal with a small number of stages, but a large number of decision variables.

Via

Abstract:Explicit exploration in the action space was assumed to be indispensable for online policy gradient methods to avoid a drastic degradation in sample complexity, for solving general reinforcement learning problems over finite state and action spaces. In this paper, we establish for the first time an $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexity for online policy gradient methods without incorporating any exploration strategies. The essential development consists of two new on-policy evaluation operators and a novel analysis of the stochastic policy mirror descent method (SPMD). SPMD with the first evaluation operator, called value-based estimation, tailors to the Kullback-Leibler divergence. Provided the Markov chains on the state space of generated policies are uniformly mixing with non-diminishing minimal visitation measure, an $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexity is obtained with a linear dependence on the size of the action space. SPMD with the second evaluation operator, namely truncated on-policy Monte Carlo (TOMC), attains an $\tilde{\mathcal{O}}(\mathcal{H}_{\mathcal{D}}/\epsilon^2)$ sample complexity, where $\mathcal{H}_{\mathcal{D}}$ mildly depends on the effective horizon and the size of the action space with properly chosen Bregman divergence (e.g., Tsallis divergence). SPMD with TOMC also exhibits stronger convergence properties in that it controls the optimality gap with high probability rather than in expectation. In contrast to explicit exploration, these new policy gradient methods can prevent repeatedly committing to potentially high-risk actions when searching for optimal policies.

Via

Authors:Guanghui Lan

Abstract:Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.

Via

Figures and Tables:

Abstract:Risk and sparsity requirements often need to be enforced simultaneously in many applications, e.g., in portfolio optimization, assortment planning, and treatment planning. Properly balancing these potentially conflicting requirements entails the formulation of functional constrained optimization with either convex or nonconvex objectives. In this paper, we focus on projection-free methods that can generate a sparse trajectory for solving these challenging functional constrained optimization problems. Specifically, for the convex setting, we propose a Level Conditional Gradient (LCG) method, which leverages a level-set framework to update the approximation of the optimal value and an inner conditional gradient oracle (CGO) for solving mini-max subproblems. We show that the method achieves $\mathcal{O}\big(\frac{1}{\epsilon^2}\log\frac{1}{\epsilon}\big)$ iteration complexity for solving both smooth and nonsmooth cases without dependency on a possibly large size of optimal dual Lagrange multiplier. For the nonconvex setting, we introduce the Level Inexact Proximal Point (IPP-LCG) method and the Direct Nonconvex Conditional Gradient (DNCG) method. The first approach taps into the advantage of LCG by transforming the problem into a series of convex subproblems and exhibits an $\mathcal{O}\big(\frac{1}{\epsilon^3}\log\frac{1}{\epsilon}\big)$ iteration complexity for finding an ($\epsilon,\epsilon$)-KKT point. The DNCG is the first single-loop projection-free method, with iteration complexity bounded by $\mathcal{O}\big(1/\epsilon^4\big)$ for computing a so-called $\epsilon$-Wolfe point. We demonstrate the effectiveness of LCG, IPP-LCG and DNCG by devising formulations and conducting numerical experiments on two risk averse sparse optimization applications: a portfolio selection problem with and without cardinality requirement, and a radiation therapy planning problem in healthcare.

Via

Figures and Tables:

Abstract:We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. The goal of planning is to find a robust policy that optimizes the worst-case values against the transition uncertainties, and thus encompasses the standard MDP planning as a special case. For $(\mathbf{s},\mathbf{a})$-rectangular uncertainty sets, we develop a policy-based first-order method, namely the robust policy mirror descent (RPMD), and establish an $\mathcal{O}(\log(1/\epsilon))$ and $\mathcal{O}(1/\epsilon)$ iteration complexity for finding an $\epsilon$-optimal policy, with two increasing-stepsize schemes. The prior convergence of RPMD is applicable to any Bregman divergence, provided the policy space has bounded radius measured by the divergence when centering at the initial policy. Moreover, when the Bregman divergence corresponds to the squared euclidean distance, we establish an $\mathcal{O}(\max \{1/\epsilon, 1/(\eta \epsilon^2)\})$ complexity of RPMD with any constant stepsize $\eta$. For a general class of Bregman divergences, a similar complexity is also established for RPMD with constant stepsizes, provided the uncertainty set satisfies the relative strong convexity. We further develop a stochastic variant, named SRPMD, when the first-order information is only available through online interactions with the nominal environment. For general Bregman divergences, we establish an $\mathcal{O}(1/\epsilon^2)$ and $\mathcal{O}(1/\epsilon^3)$ sample complexity with two increasing-stepsize schemes. For the euclidean Bregman divergence, we establish an $\mathcal{O}(1/\epsilon^3)$ sample complexity with constant stepsizes. To the best of our knowledge, all the aforementioned results appear to be new for policy-based first-order methods applied to the robust MDP problem.

Via

Figures and Tables:

Abstract:We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with optimal convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to $\widetilde{\mathcal{O}}(\epsilon^{-1})$ for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments.

Via

Figures and Tables:

Abstract:This paper studies the communication complexity of risk averse optimization over a network. The problem generalizes the well-studied risk-neutral finite-sum distributed optimization problem and its importance stems from the need to handle risk in an uncertain environment. For algorithms in the literature, there exists a gap in communication complexities for solving risk-averse and risk-neutral problems. We propose two distributed algorithms, namely the distributed risk averse optimization (DRAO) method and the distributed risk averse optimization with sliding (DRAO-S) method, to close the gap. Specifically, the DRAO method achieves the optimal communication complexity by assuming a certain saddle point subproblem can be easily solved in the server node. The DRAO-S method removes the strong assumption by introducing a novel saddle point sliding subroutine which only requires the projection over the ambiguity set $P$. We observe that the number of $P$-projections performed by DRAO-S is optimal. Moreover, we develop matching lower complexity bounds to show that communication complexities of both DRAO and DRAO-S are not improvable. Numerical experiments are conducted to demonstrate the encouraging empirical performance of the DRAO-S method.

Via