Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Expected Sarsa($λ$) with Control Variate for Variance Reduction

Jun 25, 2019

Long Yang, Yu Zhang

Figure 1 for Expected Sarsa($λ$) with Control Variate for Variance Reduction

Figure 2 for Expected Sarsa($λ$) with Control Variate for Variance Reduction

Figure 3 for Expected Sarsa($λ$) with Control Variate for Variance Reduction

Figure 4 for Expected Sarsa($λ$) with Control Variate for Variance Reduction

Share this with someone who'll enjoy it:

Abstract:Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning with function approximation falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected Sarsa($\lambda$) and propose a tabular ES($\lambda$)-CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES($\lambda$)-CV enjoys a lower variance than Expected Sarsa($\lambda$). Furthermore, to extend ES($\lambda$)-CV to be a convergent algorithm with linear function approximation, we propose the GES($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms several state-of-art gradient-based algorithms, but we use a more relaxed step-size. Numerical experiments show that the proposed algorithm is stable and converges faster with lower variance than several state-of-art gradient-based TD learning algorithms: GQ($\lambda$), GTB($\lambda$) and ABQ($\zeta$).

View paper on

Share this with someone who'll enjoy it:

Title:Expected Sarsa($λ$) with Control Variate for Variance Reduction

Paper and Code