Alert button

One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

May 31, 2022
Pedro Cisneros-Velarde, Boxiang Lyu, Sanmi Koyejo, Mladen Kolar

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: