Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Dec 07, 2023

Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

Figure 1 for Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Figure 2 for Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Figure 3 for Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Figure 4 for Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Share this with someone who'll enjoy it:

Abstract:To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon. The derived regret bound is deemed \emph{sharp}, as it matches the minimax lower bound when specialized to linear mixture MDPs up to logarithmic factors. Furthermore, UCRL-WVTR is \emph{computationally efficient} with access to a regression oracle. The achievement of such a horizon-free, instance-dependent, and sharp regret bound hinges upon (i) novel algorithm designs: weighted value-targeted regression and a high-order moment estimator in the context of general function approximation; and (ii) fine-grained analyses: a novel concentration bound of weighted non-linear least squares and a refined analysis which leads to the tight instance-dependent bound. We also conduct comprehensive experiments to corroborate our theoretical findings.

View paper on

Share this with someone who'll enjoy it:

Title:Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Paper and Code