Picture for Shangtong Zhang

Shangtong Zhang

Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning

Add code
May 22, 2024
Figure 1 for Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning
Figure 2 for Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning
Figure 3 for Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning
Figure 4 for Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning
Viaarxiv icon

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Add code
Feb 06, 2024
Viaarxiv icon

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Add code
Aug 07, 2023
Figure 1 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Figure 2 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Figure 3 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Figure 4 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Viaarxiv icon

Direct Gradient Temporal Difference Learning

Add code
Aug 02, 2023
Viaarxiv icon

Improving Monte Carlo Evaluation with Offline Data

Add code
Jan 31, 2023
Figure 1 for Improving Monte Carlo Evaluation with Offline Data
Figure 2 for Improving Monte Carlo Evaluation with Offline Data
Viaarxiv icon

On the Chattering of SARSA with Linear Function Approximation

Add code
Feb 14, 2022
Figure 1 for On the Chattering of SARSA with Linear Function Approximation
Figure 2 for On the Chattering of SARSA with Linear Function Approximation
Figure 3 for On the Chattering of SARSA with Linear Function Approximation
Figure 4 for On the Chattering of SARSA with Linear Function Approximation
Viaarxiv icon

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Add code
Nov 04, 2021
Figure 1 for Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Figure 2 for Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Figure 3 for Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Viaarxiv icon

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Add code
Aug 11, 2021
Figure 1 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Figure 2 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Figure 3 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Figure 4 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Viaarxiv icon

Learning Expected Emphatic Traces for Deep RL

Add code
Jul 12, 2021
Figure 1 for Learning Expected Emphatic Traces for Deep RL
Figure 2 for Learning Expected Emphatic Traces for Deep RL
Figure 3 for Learning Expected Emphatic Traces for Deep RL
Figure 4 for Learning Expected Emphatic Traces for Deep RL
Viaarxiv icon

Breaking the Deadly Triad with a Target Network

Add code
Feb 09, 2021
Figure 1 for Breaking the Deadly Triad with a Target Network
Figure 2 for Breaking the Deadly Triad with a Target Network
Figure 3 for Breaking the Deadly Triad with a Target Network
Figure 4 for Breaking the Deadly Triad with a Target Network
Viaarxiv icon