Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Uri Koren

Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Jun 08, 2025

Uri Koren, Navdeep Kumar, Uri Gadot, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

Figure 1 for Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Figure 2 for Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Figure 3 for Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Figure 4 for Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Abstract:Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an approach that integrates an $m$-step lookahead mechanism to enhance policy optimization. We provide theoretical analysis demonstrating that increasing the tree search depth $m$-monotonically reduces the set of undesirable stationary points and, consequently, improves the worst-case performance of any resulting stationary policy. Critically, our analysis accommodates practical scenarios where policy updates are restricted to states visited by the current policy, rather than requiring updates across the entire state space. Empirical evaluations on diverse MDP structures, including Ladder, Tightrope, and Gridworld environments, illustrate PGTS's ability to exhibit "farsightedness," navigate challenging reward landscapes, escape local traps where standard PG fails, and achieve superior solutions.

Via

Access Paper or Ask Questions

State Entropy Regularization for Robust Reinforcement Learning

Jun 08, 2025

Uri Koren, Yonatan Ashlag, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor

Figure 1 for State Entropy Regularization for Robust Reinforcement Learning

Figure 2 for State Entropy Regularization for Robust Reinforcement Learning

Figure 3 for State Entropy Regularization for Robust Reinforcement Learning

Figure 4 for State Entropy Regularization for Robust Reinforcement Learning

Abstract:State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its theoretical guarantees have not been studied. In this paper, we show that state entropy regularization improves robustness to structured and spatially correlated perturbations. These types of variation are common in transfer learning but often overlooked by standard robust RL methods, which typically focus on small, uncorrelated changes. We provide a comprehensive characterization of these robustness properties, including formal guarantees under reward and transition uncertainty, as well as settings where the method performs poorly. Much of our analysis contrasts state entropy with the widely used policy entropy regularization, highlighting their different benefits. Finally, from a practical standpoint, we illustrate that compared with policy entropy, the robustness advantages of state entropy are more sensitive to the number of rollouts used for policy evaluation.

Via

Access Paper or Ask Questions