Alert button

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Jan 27, 2019
Kefan Dong, Yuanhao Wang, Xiaoyu Chen, Liwei Wang

Figure 1 for Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: