Picture for Shie Mannor

Shie Mannor

Faculty of Electrical Engineering, Technion, Israel Institute of Technology

PlaMo: Plan and Move in Rich 3D Physical Environments

Add code
Jun 26, 2024
Viaarxiv icon

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Add code
Jun 03, 2024
Figure 1 for RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Figure 2 for RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Viaarxiv icon

On Bits and Bandits: Quantifying the Regret-Information Trade-off

Add code
May 26, 2024
Figure 1 for On Bits and Bandits: Quantifying the Regret-Information Trade-off
Figure 2 for On Bits and Bandits: Quantifying the Regret-Information Trade-off
Figure 3 for On Bits and Bandits: Quantifying the Regret-Information Trade-off
Figure 4 for On Bits and Bandits: Quantifying the Regret-Information Trade-off
Viaarxiv icon

Tree Search-Based Policy Optimization under Stochastic Execution Delay

Add code
Apr 08, 2024
Viaarxiv icon

On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

Add code
Mar 11, 2024
Figure 1 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Figure 2 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Figure 3 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Figure 4 for On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes
Viaarxiv icon

Conservative DDPG -- Pessimistic RL without Ensemble

Add code
Mar 08, 2024
Figure 1 for Conservative DDPG -- Pessimistic RL without Ensemble
Figure 2 for Conservative DDPG -- Pessimistic RL without Ensemble
Figure 3 for Conservative DDPG -- Pessimistic RL without Ensemble
Figure 4 for Conservative DDPG -- Pessimistic RL without Ensemble
Viaarxiv icon

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Add code
Feb 15, 2024
Figure 1 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Viaarxiv icon

Improving Token-Based World Models with Parallel Observation Prediction

Add code
Feb 13, 2024
Figure 1 for Improving Token-Based World Models with Parallel Observation Prediction
Figure 2 for Improving Token-Based World Models with Parallel Observation Prediction
Figure 3 for Improving Token-Based World Models with Parallel Observation Prediction
Figure 4 for Improving Token-Based World Models with Parallel Observation Prediction
Viaarxiv icon

MinMaxMin $Q$-learning

Add code
Feb 12, 2024
Viaarxiv icon

SQT -- std $Q$-target

Add code
Feb 12, 2024
Figure 1 for SQT -- std $Q$-target
Figure 2 for SQT -- std $Q$-target
Figure 3 for SQT -- std $Q$-target
Figure 4 for SQT -- std $Q$-target
Viaarxiv icon