Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Sep 09, 2019

Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves

Figure 1 for Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Figure 2 for Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Figure 3 for Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Figure 4 for Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps. To learn the value function for horizon $h$, these algorithms bootstrap from the value function for horizon $h-1$, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as "the deadly triad"). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and $n$-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad.

View paper on

Share this with someone who'll enjoy it:

Title:Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Paper and Code