Picture for Zheqing Zhu

Zheqing Zhu

Uncertainty of Joint Neural Contextual Bandit

Add code
Jun 04, 2024
Viaarxiv icon

Pearl: A Production-ready Reinforcement Learning Agent

Add code
Dec 06, 2023
Viaarxiv icon

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Add code
Oct 14, 2023
Figure 1 for Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Figure 2 for Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Figure 3 for Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Figure 4 for Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Viaarxiv icon

Offline Reinforcement Learning for Optimizing Production Bidding Policies

Add code
Oct 13, 2023
Figure 1 for Offline Reinforcement Learning for Optimizing Production Bidding Policies
Figure 2 for Offline Reinforcement Learning for Optimizing Production Bidding Policies
Figure 3 for Offline Reinforcement Learning for Optimizing Production Bidding Policies
Figure 4 for Offline Reinforcement Learning for Optimizing Production Bidding Policies
Viaarxiv icon

Scalable Neural Contextual Bandit for Recommender Systems

Add code
Jun 26, 2023
Figure 1 for Scalable Neural Contextual Bandit for Recommender Systems
Figure 2 for Scalable Neural Contextual Bandit for Recommender Systems
Figure 3 for Scalable Neural Contextual Bandit for Recommender Systems
Figure 4 for Scalable Neural Contextual Bandit for Recommender Systems
Viaarxiv icon

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Add code
Jun 01, 2023
Figure 1 for IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
Figure 2 for IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
Figure 3 for IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
Figure 4 for IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control
Viaarxiv icon

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Add code
May 24, 2023
Figure 1 for Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Figure 2 for Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Figure 3 for Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Figure 4 for Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Viaarxiv icon

Optimism Based Exploration in Large-Scale Recommender Systems

Add code
Apr 05, 2023
Figure 1 for Optimism Based Exploration in Large-Scale Recommender Systems
Figure 2 for Optimism Based Exploration in Large-Scale Recommender Systems
Figure 3 for Optimism Based Exploration in Large-Scale Recommender Systems
Figure 4 for Optimism Based Exploration in Large-Scale Recommender Systems
Viaarxiv icon

Deep Exploration for Recommendation Systems

Add code
Sep 26, 2021
Figure 1 for Deep Exploration for Recommendation Systems
Figure 2 for Deep Exploration for Recommendation Systems
Figure 3 for Deep Exploration for Recommendation Systems
Figure 4 for Deep Exploration for Recommendation Systems
Viaarxiv icon

Multi-Agent Safe Planning with Gaussian Processes

Add code
Aug 10, 2020
Figure 1 for Multi-Agent Safe Planning with Gaussian Processes
Figure 2 for Multi-Agent Safe Planning with Gaussian Processes
Figure 3 for Multi-Agent Safe Planning with Gaussian Processes
Viaarxiv icon