Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Sep 26, 2019

Tengyu Xu, Shaofeng Zou, Yingbin Liang

Figure 1 for Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Figure 2 for Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Figure 3 for Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Share this with someone who'll enjoy it:

Abstract:Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/(t^(2/3))) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate. Our experiments demonstrate that such an algorithm converges as fast as TDC under constant stepsize, and still enjoys comparable accuracy as TDC under diminishing stepsize.

* To appear in NeurIPS 2019

View paper on

Share this with someone who'll enjoy it:

Title:Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Paper and Code