Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Inverse Policy Evaluation for Value-based Sequential Decision-making

Aug 26, 2020

Alan Chan, Kris de Asis, Richard S. Sutton

Figure 1 for Inverse Policy Evaluation for Value-based Sequential Decision-making

Figure 2 for Inverse Policy Evaluation for Value-based Sequential Decision-making

Figure 3 for Inverse Policy Evaluation for Value-based Sequential Decision-making

Figure 4 for Inverse Policy Evaluation for Value-based Sequential Decision-making

Share this with someone who'll enjoy it:

Abstract:Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of \textit{some} policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to \textit{any} policy. This is especially relevant in the function-approximation regime, when the true value function can't be perfectly represented. In this work, we explore the use of \textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function. We provide theoretical and empirical results to show that inverse policy evaluation, combined with an approximate value iteration algorithm, is a feasible method for value-based control.

* Submitted to NeurIPS 2020

View paper on

Share this with someone who'll enjoy it:

Title:Inverse Policy Evaluation for Value-based Sequential Decision-making

Paper and Code