Picture for Zhengyuan Zhou

Zhengyuan Zhou

Optimal Diagonal Preconditioning: Theory and Practice

Add code
Sep 02, 2022
Figure 1 for Optimal Diagonal Preconditioning: Theory and Practice
Figure 2 for Optimal Diagonal Preconditioning: Theory and Practice
Figure 3 for Optimal Diagonal Preconditioning: Theory and Practice
Figure 4 for Optimal Diagonal Preconditioning: Theory and Practice
Viaarxiv icon

Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies

Add code
Jul 10, 2022
Viaarxiv icon

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Add code
Feb 19, 2022
Figure 1 for Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Figure 2 for Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Figure 3 for Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Viaarxiv icon

Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

Add code
Dec 08, 2021
Figure 1 for Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback
Figure 2 for Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback
Figure 3 for Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback
Figure 4 for Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback
Viaarxiv icon

Computational Benefits of Intermediate Rewards for Hierarchical Planning

Add code
Jul 08, 2021
Figure 1 for Computational Benefits of Intermediate Rewards for Hierarchical Planning
Figure 2 for Computational Benefits of Intermediate Rewards for Hierarchical Planning
Figure 3 for Computational Benefits of Intermediate Rewards for Hierarchical Planning
Figure 4 for Computational Benefits of Intermediate Rewards for Hierarchical Planning
Viaarxiv icon

Distributed stochastic optimization with large delays

Add code
Jul 06, 2021
Figure 1 for Distributed stochastic optimization with large delays
Figure 2 for Distributed stochastic optimization with large delays
Viaarxiv icon

Policy Learning with Adaptively Collected Data

Add code
May 05, 2021
Figure 1 for Policy Learning with Adaptively Collected Data
Figure 2 for Policy Learning with Adaptively Collected Data
Figure 3 for Policy Learning with Adaptively Collected Data
Figure 4 for Policy Learning with Adaptively Collected Data
Viaarxiv icon

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

Add code
Mar 08, 2021
Figure 1 for Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State
Viaarxiv icon

No Discounted-Regret Learning in Adversarial Bandits with Delays

Add code
Mar 08, 2021
Figure 1 for No Discounted-Regret Learning in Adversarial Bandits with Delays
Figure 2 for No Discounted-Regret Learning in Adversarial Bandits with Delays
Figure 3 for No Discounted-Regret Learning in Adversarial Bandits with Delays
Viaarxiv icon

Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits

Add code
Feb 25, 2021
Figure 1 for Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits
Figure 2 for Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits
Figure 3 for Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits
Viaarxiv icon