Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Apr 21, 2025

Dylan Khor, Bowen Weng

Figure 1 for Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Figure 2 for Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Figure 3 for Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Figure 4 for Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Share this with someone who'll enjoy it:

Abstract:Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. RL training typically maximizes a predefined reward (or minimizes a corresponding cost/loss) by iteratively optimizing policies within a simulator. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error. To improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversarial training, and architectural innovations. However, these methods do not eliminate the inevitable convergence trajectory and noisy oscillations of rewards, leading to heuristic policy selection or cherry-picking. This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach, formulated as a convex quadratic-constrained linear programming problem. Extensive experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests.

View paper on

Share this with someone who'll enjoy it:

Title:Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Paper and Code