Alert button
Picture for Shangtong Zhang

Shangtong Zhang

Alert button

Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning

Add code
Bookmark button
Alert button
May 22, 2024
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang

Viaarxiv icon

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Add code
Bookmark button
Alert button
Feb 06, 2024
Shuze Liu, Shuhang Chen, Shangtong Zhang

Viaarxiv icon

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Add code
Bookmark button
Alert button
Aug 07, 2023
Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals

Figure 1 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Figure 2 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Figure 3 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Figure 4 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning
Viaarxiv icon

Direct Gradient Temporal Difference Learning

Add code
Bookmark button
Alert button
Aug 02, 2023
Xiaochi Qian, Shangtong Zhang

Viaarxiv icon

Improving Monte Carlo Evaluation with Offline Data

Add code
Bookmark button
Alert button
Jan 31, 2023
Shuze Liu, Shangtong Zhang

Figure 1 for Improving Monte Carlo Evaluation with Offline Data
Figure 2 for Improving Monte Carlo Evaluation with Offline Data
Viaarxiv icon

On the Chattering of SARSA with Linear Function Approximation

Add code
Bookmark button
Alert button
Feb 14, 2022
Shangtong Zhang, Remi Tachet, Romain Laroche

Figure 1 for On the Chattering of SARSA with Linear Function Approximation
Figure 2 for On the Chattering of SARSA with Linear Function Approximation
Figure 3 for On the Chattering of SARSA with Linear Function Approximation
Figure 4 for On the Chattering of SARSA with Linear Function Approximation
Viaarxiv icon

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Add code
Bookmark button
Alert button
Nov 04, 2021
Shangtong Zhang, Remi Tachet, Romain Laroche

Figure 1 for Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Figure 2 for Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Figure 3 for Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Viaarxiv icon

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Add code
Bookmark button
Alert button
Aug 11, 2021
Shangtong Zhang, Shimon Whiteson

Figure 1 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Figure 2 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Figure 3 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Figure 4 for Truncated Emphatic Temporal Difference Methods for Prediction and Control
Viaarxiv icon

Learning Expected Emphatic Traces for Deep RL

Add code
Bookmark button
Alert button
Jul 12, 2021
Ray Jiang, Shangtong Zhang, Veronica Chelu, Adam White, Hado van Hasselt

Figure 1 for Learning Expected Emphatic Traces for Deep RL
Figure 2 for Learning Expected Emphatic Traces for Deep RL
Figure 3 for Learning Expected Emphatic Traces for Deep RL
Figure 4 for Learning Expected Emphatic Traces for Deep RL
Viaarxiv icon

Breaking the Deadly Triad with a Target Network

Add code
Bookmark button
Alert button
Feb 09, 2021
Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

Figure 1 for Breaking the Deadly Triad with a Target Network
Figure 2 for Breaking the Deadly Triad with a Target Network
Figure 3 for Breaking the Deadly Triad with a Target Network
Figure 4 for Breaking the Deadly Triad with a Target Network
Viaarxiv icon