Picture for Shie Mannor

Shie Mannor

Faculty of Electrical Engineering, Technion, Israel Institute of Technology

Maximizing the Total Reward via Reward Tweaking

Add code
Feb 09, 2020
Figure 1 for Maximizing the Total Reward via Reward Tweaking
Figure 2 for Maximizing the Total Reward via Reward Tweaking
Figure 3 for Maximizing the Total Reward via Reward Tweaking
Figure 4 for Maximizing the Total Reward via Reward Tweaking
Viaarxiv icon

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Add code
Oct 02, 2019
Figure 1 for Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
Figure 2 for Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
Figure 3 for Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
Figure 4 for Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
Viaarxiv icon

Natural Language State Representation for Reinforcement Learning

Add code
Oct 02, 2019
Figure 1 for Natural Language State Representation for Reinforcement Learning
Figure 2 for Natural Language State Representation for Reinforcement Learning
Figure 3 for Natural Language State Representation for Reinforcement Learning
Figure 4 for Natural Language State Representation for Reinforcement Learning
Viaarxiv icon

Multi-Step Greedy and Approximate Real Time Dynamic Programming

Add code
Sep 10, 2019
Figure 1 for Multi-Step Greedy and Approximate Real Time Dynamic Programming
Figure 2 for Multi-Step Greedy and Approximate Real Time Dynamic Programming
Viaarxiv icon

Off-Policy Evaluation in Partially Observable Environments

Add code
Sep 09, 2019
Figure 1 for Off-Policy Evaluation in Partially Observable Environments
Figure 2 for Off-Policy Evaluation in Partially Observable Environments
Figure 3 for Off-Policy Evaluation in Partially Observable Environments
Figure 4 for Off-Policy Evaluation in Partially Observable Environments
Viaarxiv icon

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

Add code
Sep 06, 2019
Figure 1 for Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Figure 2 for Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
Viaarxiv icon

Practical Risk Measures in Reinforcement Learning

Add code
Aug 22, 2019
Figure 1 for Practical Risk Measures in Reinforcement Learning
Figure 2 for Practical Risk Measures in Reinforcement Learning
Figure 3 for Practical Risk Measures in Reinforcement Learning
Figure 4 for Practical Risk Measures in Reinforcement Learning
Viaarxiv icon

Topic Modeling via Full Dependence Mixtures

Add code
Jun 13, 2019
Figure 1 for Topic Modeling via Full Dependence Mixtures
Figure 2 for Topic Modeling via Full Dependence Mixtures
Figure 3 for Topic Modeling via Full Dependence Mixtures
Figure 4 for Topic Modeling via Full Dependence Mixtures
Viaarxiv icon

Variance Estimation For Online Regression via Spectrum Thresholding

Add code
Jun 13, 2019
Figure 1 for Variance Estimation For Online Regression via Spectrum Thresholding
Figure 2 for Variance Estimation For Online Regression via Spectrum Thresholding
Figure 3 for Variance Estimation For Online Regression via Spectrum Thresholding
Figure 4 for Variance Estimation For Online Regression via Spectrum Thresholding
Viaarxiv icon

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

Add code
May 29, 2019
Viaarxiv icon