Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Clarissa Costen

Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

Apr 29, 2025

Harry Mead, Clarissa Costen, Bruno Lacerda, Nick Hawes

Figure 1 for Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

Figure 2 for Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

Figure 3 for Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

Figure 4 for Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation

Abstract:When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. We show, with empirical results in an number of environments, that this reformulation of the problem results in consistently improved performance compared to baselines.

Via

Access Paper or Ask Questions