Picture for Xuezhou Zhang

Xuezhou Zhang

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Add code
Jun 05, 2022
Figure 1 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Figure 2 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Figure 3 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Figure 4 for Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
Viaarxiv icon

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

Add code
Jun 01, 2022
Viaarxiv icon

Provable Benefits of Representational Transfer in Reinforcement Learning

Add code
May 29, 2022
Figure 1 for Provable Benefits of Representational Transfer in Reinforcement Learning
Figure 2 for Provable Benefits of Representational Transfer in Reinforcement Learning
Figure 3 for Provable Benefits of Representational Transfer in Reinforcement Learning
Figure 4 for Provable Benefits of Representational Transfer in Reinforcement Learning
Viaarxiv icon

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Add code
Feb 10, 2022
Figure 1 for Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Viaarxiv icon

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

Add code
Feb 02, 2022
Figure 1 for Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Figure 2 for Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Figure 3 for Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Figure 4 for Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Viaarxiv icon

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Add code
Jan 31, 2022
Figure 1 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Figure 2 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Figure 3 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Figure 4 for Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration
Viaarxiv icon

Representation Learning for Online and Offline RL in Low-rank MDPs

Add code
Oct 09, 2021
Figure 1 for Representation Learning for Online and Offline RL in Low-rank MDPs
Figure 2 for Representation Learning for Online and Offline RL in Low-rank MDPs
Viaarxiv icon

Corruption-Robust Offline Reinforcement Learning

Add code
Jun 11, 2021
Figure 1 for Corruption-Robust Offline Reinforcement Learning
Viaarxiv icon

Controllable and Diverse Text Generation in E-commerce

Add code
Feb 23, 2021
Figure 1 for Controllable and Diverse Text Generation in E-commerce
Figure 2 for Controllable and Diverse Text Generation in E-commerce
Figure 3 for Controllable and Diverse Text Generation in E-commerce
Figure 4 for Controllable and Diverse Text Generation in E-commerce
Viaarxiv icon

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

Add code
Feb 16, 2021
Viaarxiv icon