Picture for Paavo Parmas

Paavo Parmas

On Advantage Estimates for Max@K Policy Gradients

Add code
Jun 04, 2026
Viaarxiv icon

Retry Policy Gradients in Continuous Action Spaces

Add code
Jun 04, 2026
Viaarxiv icon

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

Add code
Jun 04, 2026
Viaarxiv icon

Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Add code
May 29, 2026
Viaarxiv icon

Finite-Time Regret Analysis of Retry-Aware Bandits

Add code
May 20, 2026
Viaarxiv icon

Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients?

Add code
Apr 20, 2026
Viaarxiv icon

Double Horizon Model-Based Policy Optimization

Add code
Dec 17, 2025
Viaarxiv icon

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

Add code
Sep 02, 2024
Figure 1 for Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Figure 2 for Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Figure 3 for Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Viaarxiv icon

A unified view of likelihood ratio and reparameterization gradients

Add code
May 31, 2021
Figure 1 for A unified view of likelihood ratio and reparameterization gradients
Figure 2 for A unified view of likelihood ratio and reparameterization gradients
Figure 3 for A unified view of likelihood ratio and reparameterization gradients
Figure 4 for A unified view of likelihood ratio and reparameterization gradients
Viaarxiv icon

A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme

Add code
Oct 14, 2019
Figure 1 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Figure 2 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Figure 3 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Figure 4 for A unified view of likelihood ratio and reparameterization gradients and an optimal importance sampling scheme
Viaarxiv icon