Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:All-Action Policy Gradient Methods: A Numerical Integration Approach

Oct 21, 2019

Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith, Pierre-Luc Bacon

Figure 1 for All-Action Policy Gradient Methods: A Numerical Integration Approach

Figure 2 for All-Action Policy Gradient Methods: A Numerical Integration Approach

Share this with someone who'll enjoy it:

Abstract:While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous control tasks with nonlinear function approximation. Our results show improved performance and sample efficiency.

* 9 pages, 2 figures. NeurIPS 2019 Optimization Foundations of Reinforcement Learning Workshop

View paper on

Share this with someone who'll enjoy it:

Title:All-Action Policy Gradient Methods: A Numerical Integration Approach

Paper and Code