Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

Add code

Jul 12, 2021
Siddartha Devic, Zihao Deng, Brendan Juba

Share this with someone who'll enjoy it:

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a linear transition model, only requires a linear value function with a suitable local basis with respect to the factorization. With this assumption, we can solve FMDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work, we do not assume that the transitions on various factors are independent.

* 30 pages, 1 figure 

   Access Paper Source

Share this with someone who'll enjoy it: