Picture for Zhuoran Yang

Zhuoran Yang

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

Add code
Aug 19, 2021
Viaarxiv icon

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

Add code
Aug 08, 2021
Figure 1 for Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning
Figure 2 for Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning
Figure 3 for Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning
Figure 4 for Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning
Viaarxiv icon

Towards General Function Approximation in Zero-Sum Markov Games

Add code
Jul 30, 2021
Viaarxiv icon

A Unified Off-Policy Evaluation Approach for General Value Function

Add code
Jul 06, 2021
Figure 1 for A Unified Off-Policy Evaluation Approach for General Value Function
Figure 2 for A Unified Off-Policy Evaluation Approach for General Value Function
Viaarxiv icon

Gap-Dependent Bounds for Two-Player Markov Games

Add code
Jul 01, 2021
Viaarxiv icon

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Add code
Jun 15, 2021
Figure 1 for Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Figure 2 for Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Figure 3 for Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Figure 4 for Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Viaarxiv icon

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Add code
Feb 27, 2021
Figure 1 for Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Figure 2 for Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Figure 3 for Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Viaarxiv icon

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

Add code
Feb 19, 2021
Figure 1 for Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
Figure 2 for Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
Viaarxiv icon

A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization

Add code
Feb 15, 2021
Figure 1 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 2 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 3 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Figure 4 for A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization
Viaarxiv icon

Is Pessimism Provably Efficient for Offline RL?

Add code
Dec 30, 2020
Figure 1 for Is Pessimism Provably Efficient for Offline RL?
Figure 2 for Is Pessimism Provably Efficient for Offline RL?
Figure 3 for Is Pessimism Provably Efficient for Offline RL?
Viaarxiv icon