Picture for Remi Munos

Remi Munos

INRIA Lille

Combining policy gradient and Q-learning

Add code
Apr 07, 2017
Figure 1 for Combining policy gradient and Q-learning
Figure 2 for Combining policy gradient and Q-learning
Figure 3 for Combining policy gradient and Q-learning
Figure 4 for Combining policy gradient and Q-learning
Viaarxiv icon

Learning to reinforcement learn

Add code
Jan 23, 2017
Figure 1 for Learning to reinforcement learn
Figure 2 for Learning to reinforcement learn
Figure 3 for Learning to reinforcement learn
Figure 4 for Learning to reinforcement learn
Viaarxiv icon

Unifying Count-Based Exploration and Intrinsic Motivation

Add code
Nov 07, 2016
Figure 1 for Unifying Count-Based Exploration and Intrinsic Motivation
Figure 2 for Unifying Count-Based Exploration and Intrinsic Motivation
Figure 3 for Unifying Count-Based Exploration and Intrinsic Motivation
Figure 4 for Unifying Count-Based Exploration and Intrinsic Motivation
Viaarxiv icon

Q($λ$) with Off-Policy Corrections

Add code
Aug 11, 2016
Figure 1 for Q($λ$) with Off-Policy Corrections
Viaarxiv icon

Memory-Efficient Backpropagation Through Time

Add code
Jun 10, 2016
Figure 1 for Memory-Efficient Backpropagation Through Time
Figure 2 for Memory-Efficient Backpropagation Through Time
Figure 3 for Memory-Efficient Backpropagation Through Time
Figure 4 for Memory-Efficient Backpropagation Through Time
Viaarxiv icon

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Add code
Nov 27, 2015
Figure 1 for Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
Figure 2 for Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
Figure 3 for Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
Viaarxiv icon

Bounded Regret for Finite-Armed Structured Bandits

Add code
Nov 11, 2014
Figure 1 for Bounded Regret for Finite-Armed Structured Bandits
Figure 2 for Bounded Regret for Finite-Armed Structured Bandits
Figure 3 for Bounded Regret for Finite-Armed Structured Bandits
Viaarxiv icon

Active Regression by Stratification

Add code
Oct 22, 2014
Viaarxiv icon

On Minimax Optimal Offline Policy Evaluation

Add code
Sep 12, 2014
Figure 1 for On Minimax Optimal Offline Policy Evaluation
Viaarxiv icon

Bandit Algorithms for Tree Search

Add code
Aug 09, 2014
Figure 1 for Bandit Algorithms for Tree Search
Viaarxiv icon