One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

May 31, 2022

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: