Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Reparameterization Proximal Policy Optimization

Aug 08, 2025

Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

Figure 1 for Reparameterization Proximal Policy Optimization

Figure 2 for Reparameterization Proximal Policy Optimization

Figure 3 for Reparameterization Proximal Policy Optimization

Figure 4 for Reparameterization Proximal Policy Optimization

Share this with someone who'll enjoy it:

Abstract:Reparameterization policy gradient (RPG) is promising for improving sample efficiency by leveraging differentiable dynamics. However, a critical barrier is its training instability, where high-variance gradients can destabilize the learning process. To address this, we draw inspiration from Proximal Policy Optimization (PPO), which uses a surrogate objective to enable stable sample reuse in the model-free setting. We first establish a connection between this surrogate objective and RPG, which has been largely unexplored and is non-trivial. Then, we bridge this gap by demonstrating that the reparameterization gradient of a PPO-like surrogate objective can be computed efficiently using backpropagation through time. Based on this key insight, we propose Reparameterization Proximal Policy Optimization (RPO), a stable and sample-efficient RPG-based method. RPO enables multiple epochs of stable sample reuse by optimizing a clipped surrogate objective tailored for RPG, while being further stabilized by Kullback-Leibler (KL) divergence regularization and remaining fully compatible with existing variance reduction methods. We evaluate RPO on a suite of challenging locomotion and manipulation tasks, where experiments demonstrate that our method achieves superior sample efficiency and strong performance.

View paper on

Share this with someone who'll enjoy it:

Title:Reparameterization Proximal Policy Optimization

Paper and Code