Picture for Shie Mannor

Shie Mannor

Faculty of Electrical Engineering, Technion, Israel Institute of Technology

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Jun 03, 2024
Viaarxiv icon

On Bits and Bandits: Quantifying the Regret-Information Trade-off

Add code
May 26, 2024
Viaarxiv icon

Tree Search-Based Policy Optimization under Stochastic Execution Delay

Add code
Apr 08, 2024
Viaarxiv icon

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

Mar 11, 2024
Figure 1 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Figure 2 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Figure 3 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Figure 4 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Viaarxiv icon

Conservative DDPG -- Pessimistic RL without Ensemble

Mar 08, 2024
Figure 1 for Conservative DDPG -- Pessimistic RL without Ensemble
Figure 2 for Conservative DDPG -- Pessimistic RL without Ensemble
Figure 3 for Conservative DDPG -- Pessimistic RL without Ensemble
Figure 4 for Conservative DDPG -- Pessimistic RL without Ensemble
Viaarxiv icon

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Feb 15, 2024
Figure 1 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Viaarxiv icon

Improving Token-Based World Models with Parallel Observation Prediction

Add code
Feb 13, 2024
Figure 1 for Improving Token-Based World Models with Parallel Observation Prediction
Figure 2 for Improving Token-Based World Models with Parallel Observation Prediction
Figure 3 for Improving Token-Based World Models with Parallel Observation Prediction
Figure 4 for Improving Token-Based World Models with Parallel Observation Prediction
Viaarxiv icon

SQT -- std $Q$-target

Feb 12, 2024
Figure 1 for SQT -- std $Q$-target
Figure 2 for SQT -- std $Q$-target
Figure 3 for SQT -- std $Q$-target
Figure 4 for SQT -- std $Q$-target
Viaarxiv icon

MinMaxMin $Q$-learning

Feb 12, 2024
Viaarxiv icon

Prospective Side Information for Latent MDPs

Oct 11, 2023
Viaarxiv icon