Picture for Zhaoran Wang

Zhaoran Wang

End-to-End Learning and Intervention in Games

Add code
Oct 26, 2020
Figure 1 for End-to-End Learning and Intervention in Games
Figure 2 for End-to-End Learning and Intervention in Games
Figure 3 for End-to-End Learning and Intervention in Games
Figure 4 for End-to-End Learning and Intervention in Games
Viaarxiv icon

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Add code
Oct 17, 2020
Figure 1 for Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning
Figure 2 for Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning
Figure 3 for Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning
Figure 4 for Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning
Viaarxiv icon

Provable Fictitious Play for General Mean-Field Games

Add code
Oct 08, 2020
Viaarxiv icon

Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection

Add code
Sep 04, 2020
Figure 1 for Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Figure 2 for Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Figure 3 for Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Figure 4 for Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
Viaarxiv icon

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

Add code
Aug 23, 2020
Figure 1 for Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning
Viaarxiv icon

Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time

Add code
Aug 16, 2020
Figure 1 for Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
Figure 2 for Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time
Viaarxiv icon

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

Add code
Aug 02, 2020
Figure 1 for Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Viaarxiv icon

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Add code
Jul 10, 2020
Figure 1 for A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Viaarxiv icon

Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion

Add code
Jul 04, 2020
Figure 1 for Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Figure 2 for Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Figure 3 for Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Viaarxiv icon

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

Add code
Jul 02, 2020
Figure 1 for Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach
Figure 2 for Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach
Viaarxiv icon