Picture for Richard S. Sutton

Richard S. Sutton

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions

Add code
Jul 04, 2022
Figure 1 for Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Figure 2 for Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Figure 3 for Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Figure 4 for Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
Viaarxiv icon

Toward Discovering Options that Achieve Faster Planning

Add code
May 25, 2022
Figure 1 for Toward Discovering Options that Achieve Faster Planning
Figure 2 for Toward Discovering Options that Achieve Faster Planning
Figure 3 for Toward Discovering Options that Achieve Faster Planning
Figure 4 for Toward Discovering Options that Achieve Faster Planning
Viaarxiv icon

The Quest for a Common Model of the Intelligent Decision Maker

Add code
Apr 08, 2022
Viaarxiv icon

A History of Meta-gradient: Gradient Methods for Meta-learning

Add code
Feb 20, 2022
Viaarxiv icon

Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Add code
Feb 09, 2022
Figure 1 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning
Figure 2 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning
Figure 3 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning
Figure 4 for Reward-Respecting Subtasks for Model-Based Reinforcement Learning
Viaarxiv icon

Learning Agent State Online with Recurrent Generate-and-Test

Add code
Dec 30, 2021
Figure 1 for Learning Agent State Online with Recurrent Generate-and-Test
Figure 2 for Learning Agent State Online with Recurrent Generate-and-Test
Figure 3 for Learning Agent State Online with Recurrent Generate-and-Test
Figure 4 for Learning Agent State Online with Recurrent Generate-and-Test
Viaarxiv icon

Average-Reward Learning and Planning with Options

Add code
Oct 26, 2021
Figure 1 for Average-Reward Learning and Planning with Options
Figure 2 for Average-Reward Learning and Planning with Options
Figure 3 for Average-Reward Learning and Planning with Options
Figure 4 for Average-Reward Learning and Planning with Options
Viaarxiv icon

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Add code
Sep 10, 2021
Figure 1 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 2 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 3 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 4 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Viaarxiv icon

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

Add code
Aug 13, 2021
Figure 1 for Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
Figure 2 for Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
Figure 3 for Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
Figure 4 for Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
Viaarxiv icon

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Add code
Jun 11, 2021
Figure 1 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 2 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 3 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 4 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Viaarxiv icon